-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Merge on CategoricalIndex fails if left_index=True & right_index=True, but not if on={index} #28189 #28296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
11cb1f8
95c5cf8
fd1d3f1
8321075
d938b1b
2bbc491
4e8bf1f
be85da6
9915105
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -348,3 +348,21 @@ def test_merge_join_categorical_multiindex(): | |
result = a.join(b, on=["Cat1", "Int1"]) | ||
expected = expected.drop(["Cat", "Int"], axis=1) | ||
assert_frame_equal(expected, result) | ||
|
||
|
||
def test_left_index_and_right_index_true(): | ||
# From issue 28189 | ||
df = DataFrame( | ||
range(4), columns=["value"], index=Index(Categorical(["1"] * 4), name="idx") | ||
) | ||
df2 = DataFrame( | ||
[[6]], columns=["value"], index=Index(Categorical(["1"]), name="idx") | ||
) | ||
|
||
result = merge(df, df2, how="left", left_index=True, right_index=True) | ||
result = result.reset_index(drop=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the reset_index required or can the index be included as part of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I left it out of the test because of the issues @hugoecarl mentioned on the PR. If the index are included, the test will break once they're fixed, but if you want, it could be included and I would need to change the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean by "the test will break once they're fixed"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What @guipleite is trying to say is that when you use the left-join, there is a problem with the index values output. The bug is related with the issues @hugoecarl pointed here in the PR. With that been said, if we include index as a part of the expected df, we would have to force the wrong output to prove that the bug pointed on this PR is solved, then when those other issues are corrected, this test would be outdated. Does it make sense to you? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC you are saying there are multiple bugs and this solves one of them, but the other would ned to be solved separately right? If so is there an open issue for that? Might have missed it in this PR There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm but don't those issues have to do with missing data on merge? Maybe missing the point but don't see how applicable here. Isn't the expected index still just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think maybe I'm missing the point here, so I'll take a step back. We worked on issue #28189 and the problem was that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm OK - I think I understand now. So this change doesn't raise an error but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Exactly that! We just added the support to int8 and int16 type that uncovered this output bug. |
||
expected = DataFrame( | ||
np.array([[0, 6], [1, 6], [2, 6], [3, 6]]), columns=["value_x", "value_y"] | ||
) | ||
|
||
assert_frame_equal(expected, result) |
Uh oh!
There was an error while loading. Please reload this page.