Skip to content

BUG: GroupBy.quantile implicitly sorts index.levels #53049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 12, 2023

Conversation

Charlie-XIAO
Copy link
Contributor

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Lookng good - some small requests.

@Charlie-XIAO
Copy link
Contributor Author

Hi @rhshadrach, thanks for your review! I've updated the changelog and the tests as you suggested. However, I'm not sure if index.levels is being compared. I tried using

qind = pd.MultiIndex(
    [["A", "B"], [0.2, 0.8]],
    [[1, 1, 0, 0], [0, 1, 0, 1]],
    names=["cat1", None],
)

and it still passed the test. Is there any recommended way of fixing this?

@rhshadrach
Copy link
Member

Thanks for identifying this @Charlie-XIAO - I opened #53095. For this PR, I think it'd be good to keep the assert_series_equal, and assert the index levels as you had it.

@Charlie-XIAO
Copy link
Contributor Author

Thanks @rhshadrach! I've retrieved the index levels check.

@Charlie-XIAO
Copy link
Contributor Author

Thanks for your review @rhshadrach! I've pushed the changes that avoid performance regression when idx is not MultiIndex as you suggested. Please let me know if similar changes should apply to the case when idx is MultiIndex.

@rhshadrach
Copy link
Member

Please let me know if similar changes should apply to the case when idx is MultiIndex.

Yes! It looks to me like these two blocks can now be largely combined.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach added Bug Reduction Operations sum, mean, min, max, etc. labels May 12, 2023
@rhshadrach rhshadrach added this to the 2.1 milestone May 12, 2023
@mroeschke mroeschke merged commit f9a0b87 into pandas-dev:main May 12, 2023
@mroeschke
Copy link
Member

Thanks @Charlie-XIAO

Rylie-W pushed a commit to Rylie-W/pandas that referenced this pull request May 19, 2023
…3049)

* fixed groupby quantile index.levels

* changelog added

* updated changelog

* modified test to test whole frame
- not sure but I think it is not checking index levels like this

* updated tests to check index levels separately

* avoid performance regression when not multiindex

* tried to combine code blocks
@Charlie-XIAO Charlie-XIAO deleted the grpby-quantile-lvls branch June 28, 2023 05:43
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
…3049)

* fixed groupby quantile index.levels

* changelog added

* updated changelog

* modified test to test whole frame
- not sure but I think it is not checking index levels like this

* updated tests to check index levels separately

* avoid performance regression when not multiindex

* tried to combine code blocks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: SeriesGroupby with sort=False behaves inconsistently with .quantiles(...).unstack()
3 participants