Skip to content

PERF: Fix performance regression #33365 #33540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 15, 2020

Conversation

rtlee9
Copy link
Contributor

@rtlee9 rtlee9 commented Apr 14, 2020

Fix performance regression in Series.is_monotonic_increasing for categorical
by avoiding Categorical construction for categorical series

@rtlee9 rtlee9 force-pushed the fix_perf_regression_33365 branch from 2785007 to bef979b Compare April 14, 2020 05:49
Fix performance regression in Series.is_monotonic_increasing for categorical
by avoiding Categorical construction for categorical series
@rtlee9 rtlee9 force-pushed the fix_perf_regression_33365 branch from bef979b to 3b6a672 Compare April 14, 2020 15:58
@jbrockmendel
Copy link
Member

can you show %timeit results for the perf fix

@alimcmaster1 alimcmaster1 added Categorical Categorical Data Type Performance Memory or execution speed performance labels Apr 14, 2020
@rtlee9
Copy link
Contributor Author

rtlee9 commented Apr 15, 2020

This asv benchmark is identical to the %timeit version shared in #33365. Let me know if timeit is preferred over this and I can share that as well.

before           after         ratio
     [29d6b023]       [b8605d24]
     <v1.0.1^0>       <fix_perf_regression_33365>
-        64.6±3μs         57.5±1μs     0.89  categoricals.IsMonotonic.time_categorical_series_is_mono
tonic_increasing

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@jreback
Copy link
Contributor

jreback commented Apr 15, 2020

an you run all categorical and index construction asv's

@jreback jreback added this to the 1.1 milestone Apr 15, 2020
@rtlee9
Copy link
Contributor Author

rtlee9 commented Apr 15, 2020

Here is the asv for everything. I re-ran the two that showed regressions afterwards and they did not reproduce.

$ asv continuous HEAD -f 1.1

before           after         ratio
     [455de19b]       [3b6a6725]
     <master~1>       <fix_perf_regression_33365>
+      40.7±0.5ms         52.2±3ms     1.28  io.parsers.DoesStringLookLikeDatetime.time_check_datetimes('0.0')
+     2.00±0.04ms       2.42±0.4ms     1.21  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'nearest')
-        880±40ns         800±20ns     0.91  index_cached_properties.IndexCache.time_is_all_dates('RangeIndex')
-        431±20μs          391±9μs     0.91  index_cached_properties.IndexCache.time_is_monotonic('MultiIndex')
-      1.88±0.2ms      1.71±0.05ms     0.91  index_cached_properties.IndexCache.time_is_unique('Float64Index')
-        18.2±1ms       16.5±0.1ms     0.91  inference.ToNumericDowncast.time_downcast('datetime64', 'integer')
-      5.48±0.2μs       4.96±0.1μs     0.90  index_cached_properties.IndexCache.time_shape('MultiIndex')
-        630±80μs          570±3μs     0.90  groupby.GroupByMethods.time_dtype_as_group('int', 'std', 'direct')
-     2.21±0.06ms      2.00±0.01ms     0.90  groupby.RankWithTies.time_rank_ties('float32', 'min')
-      10.9±0.7ms      9.84±0.04ms     0.90  groupby.Categories.time_groupby_nosort
-     2.87±0.09ms      2.59±0.02ms     0.90  groupby.TransformNaN.time_first
-      2.08±0.2ms       1.87±0.2ms     0.90  index_cached_properties.IndexCache.time_is_unique('MultiIndex')
-     2.22±0.05ms      1.99±0.02ms     0.90  groupby.RankWithTies.time_rank_ties('datetime64', 'average')
-        870±30ns         780±20ns     0.90  index_cached_properties.IndexCache.time_is_unique('Int64Index')
-     2.22±0.06ms      1.99±0.01ms     0.89  groupby.RankWithTies.time_rank_ties('float32', 'average')
-     2.26±0.06ms      2.02±0.01ms     0.89  groupby.RankWithTies.time_rank_ties('int64', 'dense')
-      4.37±0.5μs       3.90±0.4μs     0.89  index_cached_properties.IndexCache.time_is_all_dates('TimedeltaIndex')
-      1.68±0.1ms      1.49±0.01ms     0.89  algorithms.Quantile.time_quantile(0, 'midpoint', 'int')
-      28.1±0.8ms       24.8±0.3ms     0.88  groupby.Nth.time_series_nth_all('float64')
-      2.44±0.2μs      2.14±0.05μs     0.88  index_cached_properties.IndexCache.time_shape('PeriodIndex')
-       639±100μs          551±3μs     0.86  groupby.GroupByMethods.time_dtype_as_group('int', 'sum', 'transformation')
-        34.2±1ms       29.4±0.3ms     0.86  groupby.Nth.time_series_nth_all('object')
-      3.58±0.4μs       3.06±0.1μs     0.85  index_cached_properties.IndexCache.time_inferred_type('MultiIndex')
-        623±20μs        530±0.8μs     0.85  groupby.GroupByMethods.time_dtype_as_group('int', 'mean', 'direct')
-      4.12±0.4μs       3.48±0.2μs     0.84  index_cached_properties.IndexCache.time_inferred_type('TimedeltaIndex')
-      3.68±0.4μs       3.11±0.4μs     0.84  index_cached_properties.IndexCache.time_is_all_dates('UInt64Index')
-      1.57±0.1ms      1.31±0.01ms     0.83  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-      5.27±0.8μs       4.37±0.2μs     0.83  index_cached_properties.IndexCache.time_shape('Float64Index')
-        12.2±1μs       10.1±0.5μs     0.83  index_cached_properties.IndexCache.time_engine('TimedeltaIndex')
-        11.2±1μs       9.07±0.6μs     0.81  index_cached_properties.IndexCache.time_engine('UInt64Index')
-      4.38±0.8μs       3.53±0.3μs     0.81  index_cached_properties.IndexCache.time_values('TimedeltaIndex')
-      3.64±0.6μs       2.89±0.4μs     0.79  index_cached_properties.IndexCache.time_is_all_dates('Float64Index')
-      4.03±0.4μs       3.09±0.3μs     0.77  index_cached_properties.IndexCache.time_inferred_type('Float64Index')
-      4.00±0.4μs       3.03±0.2μs     0.76  index_cached_properties.IndexCache.time_inferred_type('UInt64Index')
-      7.35±0.7μs       5.53±0.3μs     0.75  index_cached_properties.IndexCache.time_shape('TimedeltaIndex')
-        11.4±1μs       8.45±0.5μs     0.74  index_cached_properties.IndexCache.time_engine('Float64Index')
-      4.50±0.5μs       2.81±0.3μs     0.62  index_cached_properties.IndexCache.time_values('Float64Index')
-        90.9±1μs       54.0±0.3μs     0.59  categoricals.IsMonotonic.time_categorical_series_is_monotonic_increasing
-      90.2±0.6μs       53.5±0.4μs     0.59  categoricals.IsMonotonic.time_categorical_series_is_monotonic_decreasing
-        151±70ms       41.5±0.2ms     0.27  frame_methods.Reindex.time_reindex_axis1

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

@jreback jreback merged commit 114e33a into pandas-dev:master Apr 15, 2020
@jreback
Copy link
Contributor

jreback commented Apr 15, 2020

thanks @rtlee9

@jreback
Copy link
Contributor

jreback commented Apr 15, 2020

I don't believe we need a note as this was a recent master, but @TomAugspurger lmk if not the case

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 15, 2020 via email

@rtlee9 rtlee9 deleted the fix_perf_regression_33365 branch April 16, 2020 05:28
CloseChoice pushed a commit to CloseChoice/pandas that referenced this pull request Apr 20, 2020
Fix performance regression in Series.is_monotonic_increasing for categorical
by avoiding Categorical construction for categorical series
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Regression in Series.is_monotonic_increasing for categorical
5 participants