-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
fix series.isin slow issue with Dtype IntegerArray #38379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 6 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
109c0e7
fix series.isin slow issue with Dtype IntegerArray
tushushu e9f96ea
Move isinstance(comps, IntegerArray) to algo.isin
tushushu a6be9c8
cannot import IntegerArray due to circular import
tushushu 415b590
fix bug in pandas (Linux py38_np_dev)
tushushu f3e5afb
fix pre commit issue.
tushushu 14579fc
fix the code style issue.
tushushu 562c918
move the logic to elif block.
tushushu 1449d3c
remove blank line.
tushushu 3ccc917
copy codes from #38422
tushushu 98a0683
make `isin` correct for pd.NA
tushushu 6e2917e
sort imports
tushushu a4b6503
Avoiding import pandas as pd.
tushushu f95fde9
fix cannot import NA issue.
tushushu 2348a60
Merge pull request #1 from pandas-dev/master
tushushu c963183
Merge remote-tracking branch 'upstream/master' into ENH-implement-fas…
tushushu c151102
Merge pull request #2 from pandas-dev/master
tushushu ef99e86
Merge branch 'ENH-implement-fast-isin' of github.com:tushushu/pandas …
tushushu f23ba94
Merge remote-tracking branch 'origin/master' into ENH-implement-fast-…
tushushu 13dc64f
Adding Int64 and Float64 for benchmarks.
tushushu cc38088
Adding isin benchmarks for Boolean array
tushushu 94846cc
Adding what's new note.
tushushu f4cb5ce
fix IsInFloat64 benchmarks
tushushu 2ee8b05
always return false for null values.
tushushu 7763b18
fix flake8 error.
tushushu 87dfff9
remove unused lines.
tushushu d2d32d1
refactors for series benchmarks.
tushushu 90ef57a
fix flake8 errors.
tushushu a48e00b
Change back to see if can pass the tests.
tushushu 2279519
Merge remote-tracking branch 'upstream/master' into ENH-implement-fas…
tushushu c726d4a
makes NA isin [NA] return True.
tushushu 1134ad6
remove redundant codes.
tushushu bce0e3e
makes performance better.
tushushu bf788e5
fix flake8 errors.
tushushu 40950be
polish codes
tushushu 570d640
not import NA
tushushu 0f89578
fix code style
tushushu 199c11c
fix black error.
tushushu 9f35b5b
fix CI
tushushu 2238dc5
Merge remote-tracking branch 'upstream/master' into ENH-implement-fas…
tushushu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work if you handle it in the extension array
elif
instead of separately up here?pandas/pandas/core/algorithms.py
Lines 467 to 471 in 3aa8447
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the issue here is that we actually need to implement .isin on the EAs (numeric dtypes); we already do this on the internal EAs (datetime, interval etc).
cc @jbrockmendel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Andrew,
Are you suggesting something like this:
I've made a test the performance is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that was what i meant
Does anything break, and do you get an
isin
speedup, if you relax theIntegerArray
check and usecomps._data
for other EAs (such asFloatingArray
)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the performance increases from 20ms to 2ms after the modification. Please refer to #38340 if you are interested in the performance tests.
Sure I could add other EAs here, will do that later. Thanks Andrew~
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found ExtensionArray does not has the '_data' member, and only subclass of
BaseMaskedArray
has the _data member, so I guess the classes below should be considered:FloatingArray
IntegerArray
BooleanArray