Skip to content

PERF: Index.sort_values for already sorted index #56128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 23, 2023

Conversation

lukemanley
Copy link
Member

Take advantage of the cached is_monotonic attributes.

import pandas as pd

N = 1_000_000

idx = pd._testing.makeStringIndex(N).sort_values()
%timeit idx.sort_values()

# 2.35 s ± 75.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   <- main
# 14.6 µs ± 3.48 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)  <- PR

idx = pd.date_range("2000-01-01", freq="s", periods=N)
%timeit idx.sort_values(ascending=False)

# 90.3 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)      <- main
# 11.6 µs ± 384 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)  <- PR

Existing ASV:


from asv_bench.benchmarks.categoricals import Indexing

b = Indexing()
b.setup()
%timeit b.time_sort_values()

# 4.85 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)     <- main
# 21.4 µs ± 319 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)  <- PR

@lukemanley lukemanley added Performance Memory or execution speed performance Index Related to the Index class or subclasses labels Nov 23, 2023
@lukemanley lukemanley added this to the 2.2 milestone Nov 23, 2023
@phofl
Copy link
Member

phofl commented Nov 23, 2023

Can you try this when this isn't cached? E.g. recreating the series for every pass

@lukemanley
Copy link
Member Author

Can you try this when this isn't cached? E.g. recreating the series for every pass

without having been pre-cached:

import pandas as pd

N = 1_000_000

values = pd._testing.makeStringIndex(N).sort_values().values
%timeit pd.Index(values).sort_values()

# 2.45 s ± 48.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <- main
# 259 ms ± 9.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   <- PR

values = pd.date_range("2000-01-01", freq="s", periods=N).values
%timeit pd.Index(values).sort_values(ascending=False)

# 91.7 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- main
# 2.89 ms ± 290 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  <- PR

@phofl phofl merged commit b6834f8 into pandas-dev:main Nov 23, 2023
@phofl
Copy link
Member

phofl commented Nov 23, 2023

thx @lukemanley

@lukemanley lukemanley deleted the index-sort-values branch November 25, 2023 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants