Skip to content

Bug fix - GroupBy.describe produces inconsistent results for empty datasets #46162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
May 21, 2022

Conversation

weikhor
Copy link
Contributor

@weikhor weikhor commented Feb 26, 2022

@weikhor weikhor changed the title add test Bug Fix GroupBy.describe produces inconsistent results for empty datasets Feb 26, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls also add a whatsnew note, in bug fixes for 1.5 in groupby section

@jreback jreback added Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 27, 2022
@weikhor weikhor changed the title Bug Fix GroupBy.describe produces inconsistent results for empty datasets Bug fix - GroupBy.describe produces inconsistent results for empty datasets Feb 27, 2022
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@weikhor
Copy link
Contributor Author

weikhor commented Mar 19, 2022

I close this since I stuck this for long time. I get simpler task. Thank for review.

@weikhor weikhor closed this Mar 19, 2022
@weikhor weikhor reopened this Mar 30, 2022
@weikhor weikhor requested a review from rhshadrach April 29, 2022 15:21

def test_groupby_empty_dataset():
# GH#41575
df = DataFrame(columns=["A", "B", "C"], dtype=int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df here should be non-empty; that way we are testing the non-empty case df against the empty case df.iloc[:0] below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would also be good to have different columns with different dtypes for testing include/exclude: int, float, object would be sufficient.

@weikhor weikhor requested a review from rhshadrach May 1, 2022 12:45
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach
Copy link
Member

@jreback - good here?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm this seems a lot special cases here - this already dispatches to describe which should be able to handle empties

@weikhor
Copy link
Contributor Author

weikhor commented May 3, 2022

ok. Currently, I do not have ideas. I will try it.

@rhshadrach
Copy link
Member

rhshadrach commented May 11, 2022

@jreback

umm this seems a lot special cases here - this already dispatches to describe which should be able to handle empties

This dispatches to DataFrame.describe via _python_apply_general which doesn't call the function when the frame is empty.

@weikhor weikhor force-pushed the groupby_describe_empty_dataset branch from 785f4a4 to cf99c9e Compare May 11, 2022 11:59
@rhshadrach
Copy link
Member

@jreback - gentle ping. See my comment immediately above.

@jreback
Copy link
Contributor

jreback commented May 19, 2022

will look

@weikhor can u merge master

@weikhor weikhor requested a review from jreback May 19, 2022 17:03
@jreback jreback added this to the 1.5 milestone May 21, 2022
@jreback jreback merged commit c12683d into pandas-dev:main May 21, 2022
@jreback
Copy link
Contributor

jreback commented May 21, 2022

thanks @weikhor

@weikhor weikhor deleted the groupby_describe_empty_dataset branch May 31, 2022 16:12
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: GroupBy.describe produces inconsistent results for empty datasets
3 participants