Skip to content

REF: Use numpy set methods in interpolate #57997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 26, 2024

Conversation

mroeschke
Copy link
Member

_interpolate_1d goes back an forth between Python objects and Numpy objects in order to use Python set methods. Refactoring to just use Numpy set routines instead

@mroeschke mroeschke added Refactor Internal refactoring of code Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Mar 25, 2024
@jbrockmendel
Copy link
Member

Neat! Is there a perf bump or just the principle of the thing?

@mroeschke
Copy link
Member Author

mroeschke commented Mar 25, 2024

Neat! Is there a perf bump or just the principle of the thing?

Looks like there's a perf bump when the refactored limit_direction and limit_area are hit

In [1]: import pandas as pd
+ /opt/miniconda3/envs/pandas-dev/bin/ninja
[4/4] Linking target pandas/_libs/tslibs/parsing.cpython-310-darwin.so

In [2]: ser = pd.Series([None, 1, None] * 250_000, dtype="Int64")

In [3]: %timeit ser.interpolate(limit_direction="forward", limit_area="outside")
24.7 ms ± 1.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each). # PR

In [3]: %timeit ser.interpolate(limit_direction="forward", limit_area="outside")
63.3 ms ± 563 µs per loop (mean ± std. dev. of 7 runs, 10 loops each). # main

@mroeschke mroeschke added this to the 3.0 milestone Mar 25, 2024
@mroeschke
Copy link
Member Author

Going to merge as all green here. Can follow up if needed

@mroeschke mroeschke merged commit fc4af6a into pandas-dev:main Mar 26, 2024
@mroeschke mroeschke deleted the ref/interp/arrays branch March 26, 2024 17:11
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
* Use numpy arrays instead of sets in interp

* Enable assume_unique in intersect1d

* Typing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants