Skip to content

fixes #26622: pd.MultiIndex.isin should fail when input args are too short #26623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

telamonian
Copy link

@telamonian telamonian commented Jun 2, 2019

This fixes #26622 by making pd.MultiIndex.isin fail when input args are too short, as well as when they are too long (the current behavior). It also replaces existing error message with something a little bit clearly and more directly related to isin and its parameters.

Currently, isin does some validation by constructing a new MultiIndex and then taking its values:

values = MultiIndex.from_tuples(values, names=self.names).values

Though the code I wrote should handle all validation relating to the length of the elements of values, I left this line in since I was unsure if it was validating/fixing anything else. Does anyone know if that line is safe to remove now?

@jreback
Copy link
Contributor

jreback commented Jun 2, 2019

pls always add tests, how would u know if it works otherwise?

@telamonian
Copy link
Author

@jreback Already on it! I'll push em in a sec. What does the whatsnew entry on the checklist refer to?

@simonjayhawkins simonjayhawkins added Error Reporting Incorrect or improved errors from pandas MultiIndex labels Jun 2, 2019
@simonjayhawkins
Copy link
Member

@jreback Already on it! I'll push em in a sec. What does the whatsnew entry on the checklist refer to?

doc/source/whatsnew/v0.25.0.rst

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@telamonian Thanks for the PR.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent. i think we're almost done.

@codecov
Copy link

codecov bot commented Jun 2, 2019

Codecov Report

Merging #26623 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26623      +/-   ##
==========================================
+ Coverage   91.86%   91.87%   +<.01%     
==========================================
  Files         174      174              
  Lines       50722    50696      -26     
==========================================
- Hits        46594    46575      -19     
+ Misses       4128     4121       -7
Flag Coverage Δ
#multiple 90.41% <100%> (+0.01%) ⬆️
#single 41.78% <0%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 95.68% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/io/clipboard/__init__.py 54.34% <0%> (-2.52%) ⬇️
pandas/core/dtypes/cast.py 90.54% <0%> (-1%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.81% <0%> (-0.11%) ⬇️
pandas/core/sparse/frame.py 95.63% <0%> (-0.02%) ⬇️
pandas/core/generic.py 93.61% <0%> (-0.02%) ⬇️
pandas/plotting/_core.py 83.77% <0%> (ø) ⬆️
pandas/core/sparse/scipy_sparse.py 100% <0%> (ø) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update addc5fc...47fbed2. Read the comment docs.

@simonjayhawkins simonjayhawkins added this to the 0.25.0 milestone Jun 2, 2019
@jreback jreback changed the title fixes #26622: make pd.MultiIndex.isin fail when input args are too short fixes #26622: pd.MultiIndex.isin should fail when input args are too short Jun 2, 2019
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @jreback

@telamonian
Copy link
Author

Great. Thanks @simonjayhawkins for taking the time to review this.

@@ -3175,6 +3175,14 @@ def _wrap_joined_index(self, joined, other):
@Appender(Index.isin.__doc__)
def isin(self, values, level=None):
if level is None:
# validate value length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, we already do this sort of validation in _verify_integrity in the MI construction, so I think the MI is actually getting constructed correctly, but you have nan's in the levels.

I am not opposed to the check, but it seems that maybe the semantics of handling this are not exactly right, meaning what you passed in your tests is actually valid.

@jreback
Copy link
Contributor

jreback commented Jun 12, 2019

cc @toobaz for thoughts on this

@jreback jreback removed this from the 0.25.0 milestone Jun 28, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge master and fix up conflict?

@@ -608,6 +608,7 @@ MultiIndex
^^^^^^^^^^

- Bug in which incorrect exception raised by :class:`Timedelta` when testing the membership of :class:`MultiIndex` (:issue:`24570`)
- :meth:`MultiIndex.isin` now raises ``ValueError`` when items in values are not the same length as the number of levels in the MultiIndex (:issue:`26622`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to 1.0.0?

@jreback
Copy link
Contributor

jreback commented Sep 8, 2019

can you merge master and update to comments

@jreback
Copy link
Contributor

jreback commented Oct 6, 2019

can you merge master and move the note to 1.0.0

@WillAyd
Copy link
Member

WillAyd commented Nov 7, 2019

Closing as this looks stale, but @telamonian ping if you'd like to pick back up

@WillAyd WillAyd closed this Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.MultiIndex.isin doesn't validate input correctly
4 participants