-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Bug fix - GroupBy.describe produces inconsistent results for empty datasets #46162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix - GroupBy.describe produces inconsistent results for empty datasets #46162
Conversation
…_describe_empty_dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls also add a whatsnew note, in bug fixes for 1.5 in groupby section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
…_describe_empty_dataset
I close this since I stuck this for long time. I get simpler task. Thank for review. |
…_describe_empty_dataset
|
||
def test_groupby_empty_dataset(): | ||
# GH#41575 | ||
df = DataFrame(columns=["A", "B", "C"], dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df here should be non-empty; that way we are testing the non-empty case df
against the empty case df.iloc[:0]
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would also be good to have different columns with different dtypes for testing include/exclude: int, float, object would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@jreback - good here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm this seems a lot special cases here - this already dispatches to describe which should be able to handle empties
ok. Currently, I do not have ideas. I will try it. |
This dispatches to DataFrame.describe via _python_apply_general which doesn't call the function when the frame is empty. |
785f4a4
to
cf99c9e
Compare
@jreback - gentle ping. See my comment immediately above. |
will look @weikhor can u merge master |
thanks @weikhor |
GroupBy.describe
produces inconsistent results for empty datasets #41575doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.