-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix duplicates in intersection of multiindexes #36927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 11 commits
cdefaae
fbd63f2
53a37d1
5675a4e
134936c
582c0b9
7805de5
67691df
8fb0055
66b519f
cb1477b
0fb2561
3c19d57
10524fd
3dde0ee
a0a1a33
45dfb84
d71a499
d873d5a
e90239a
c2b448a
742716e
321797a
a980ec0
972fd48
fe1ded4
8e4d47b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -457,7 +457,9 @@ def _should_reindex_frame_op( | |||
# TODO: any other cases we should handle here? | ||||
cols = left.columns.intersection(right.columns) | ||||
|
||||
if len(cols) and not (cols.equals(left.columns) and cols.equals(right.columns)): | ||||
if len(cols) and not ( | ||||
cols.equals(left.columns.unique()) and cols.equals(right.columns.unique()) | ||||
): | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we have a test that fails without this change? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but the builds are gone. I will run the tests in the evening to find the relevant tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This one is crashing in line 255 because of memory issues. The condition is never fullfilled, so it blows up There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the condition in master is never fulfilled or the condition in the PR? btw, usually let the person who asked the question hit the "resolve conversation" button There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, did not know that. Will do that in the future. The input There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this still needed? we now guaranteee that intersection returns unique columns, right so this should no longer be the case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this is the problem. Is intersection is unique, cols.equals(left.columns) won't be True. This leads to a recursion in the test mentioned which blows up the memory, because we can not exit this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok pls add some comments to this effect then. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i would prob calculate left_uniques and right_uniques and make them variables as a bit simpler to read There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx, changed it and added a comment |
||||
# TODO: is there a shortcut available when len(cols) == 0? | ||||
return True | ||||
|
||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref #36915 (comment) This is fixing a regression from 1.0.5? Can we target 1.1.5 with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the whatsnew, but this may be tricky to backport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not yet reviewed this PR. are there any changes other than those required to fix the regression here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge and setops relied partially on the wrong behavior, had to fix this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, not investigated fully, but this PR may sit on top of #37171 (which incidentally caused a perf regression #37171 (comment) so maybe that we want to revert that first before committing the changes here anyhow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened #38133 to test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not checked all the envs, but a quick check on a couple shows the same tests failing on the backport as here. so that's encouraging.
I think we should target as a backport. and if issues down the road will just have to move the release note back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, thats a good sign. Lets try this then, when everything is green here