Skip to content

TEST: join df with categorical multiIndex #51088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

natmokval
Copy link
Contributor

Add test to check index dtype after joining DataFrames with categorical multiIndex.

).set_index(["idx1", "idx2"])

for how in ["inner", "outer", "left", "right"]:
df = df1.join(df2, how=how)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you name this result and compare to an expected DataFrame. This will compare the types as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mroeschke. I corrected the test as you suggested.

@mroeschke mroeschke added the Testing pandas testing functions or related to the test suite label Jan 31, 2023
}
).set_index(["idx1", "idx2"])

for how in ["inner", "outer", "left", "right"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use pytest.mark.parameterize for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course. Added parametrize to this commit

Comment on lines 964 to 970
df1 = DataFrame(
{
"idx1": Categorical(["a", "a", "a"]),
"idx2": Categorical(["a", "a", "b"]),
"data": [1, 2, 3],
}
).set_index(["idx1", "idx2"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this

why not use the example from #50906? for regression tests, if the original example is small and self-contained, it'd probably be safer to just use that directly
the rest looks good though 👍

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid, nice - looks good to me pending green (and "ready for review" 😄 )

@natmokval
Copy link
Contributor Author

Thank you, I am looking forward to seeing a cat :)

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 3, 2023

this is currently marked as draft:

image

it looks ready though, anything else need doing? if you mark as ready for review I think we can ship it

@natmokval natmokval marked this pull request as ready for review February 5, 2023 11:54
@natmokval
Copy link
Contributor Author

I forgot to change the status of the PR. Now it is "ready for review" :)

@MarcoGorelli MarcoGorelli added this to the 2.0 milestone Feb 6, 2023
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @natmokval

@MarcoGorelli MarcoGorelli merged commit b2a26ec into pandas-dev:main Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: when inner/outer-joining dataframes with categorical MultiIndex, the output index dtype depends on row ordering
3 participants