Skip to content

Bug fix - GroupBy.describe produces inconsistent results for empty datasets #46162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
May 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
2987746
add test
Feb 26, 2022
43d3aeb
Add test
Feb 26, 2022
e2c1ded
Update groupby.py
Feb 26, 2022
d4163ba
Merge branch 'main' of https://github.com/weikhor/pandas into groupby…
Feb 26, 2022
5dd036d
pre_commit
Feb 27, 2022
54ca1f4
Update v1.5.0.rst
Feb 27, 2022
8377dbc
Resolve conflict
Feb 27, 2022
9c23487
Merge branch 'main' into groupby_describe_empty_dataset
Feb 27, 2022
1f777da
pre-commit
Feb 27, 2022
7a775e8
Merge branch 'main' of https://github.com/weikhor/pandas into groupby…
Mar 1, 2022
e1ccb50
Merge branch 'main' into groupby_describe_empty_dataset
Mar 30, 2022
f3d4bfb
Merge branch 'main' of https://github.com/weikhor/pandas into groupby…
Apr 2, 2022
a85fc87
add
Apr 4, 2022
c293e2d
test
Apr 4, 2022
595e4ef
test
Apr 9, 2022
678fd2d
Merge branch 'main' into groupby_describe_empty_dataset
Apr 9, 2022
8d81d6e
test
Apr 9, 2022
c15731b
add
Apr 9, 2022
94e9779
Merge branch 'main' into groupby_describe_empty_dataset
Apr 9, 2022
3e2e6d8
add
Apr 17, 2022
7d6c25f
arrange
Apr 17, 2022
1c343e0
Merge branch 'main' into groupby_describe_empty_dataset
Apr 17, 2022
62f5388
mypy
Apr 17, 2022
da2aca8
Merge branch 'main' into groupby_describe_empty_dataset
Apr 18, 2022
5c16992
Merge branch 'main' into groupby_describe_empty_dataset
Apr 19, 2022
8680c3b
add
Apr 20, 2022
933c008
Merge branch 'main' into groupby_describe_empty_dataset
Apr 20, 2022
dbe1033
add test
Apr 29, 2022
6efc1b6
add
Apr 29, 2022
36c211e
test
Apr 29, 2022
914327c
test
Apr 29, 2022
a1fde01
test
Apr 29, 2022
bdc18a7
Merge branch 'main' into groupby_describe_empty_dataset
Apr 29, 2022
5ecb59f
test
May 1, 2022
8f421ae
test
May 1, 2022
cf99c9e
Merge branch 'main' into groupby_describe_empty_dataset
May 1, 2022
be87c8f
Merge branch 'main' into groupby_describe_empty_dataset
May 11, 2022
d790d10
update expression
May 19, 2022
3446b6b
Merge branch 'main' into groupby_describe_empty_dataset
May 19, 2022
fe6d76c
Merge branch 'main' into groupby_describe_empty_dataset
May 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -786,6 +786,8 @@ Groupby/resample/rolling
- Bug in :meth:`.Rolling.var` would segfault calculating weighted variance when window size was larger than data size (:issue:`46760`)
- Bug in :meth:`Grouper.__repr__` where ``dropna`` was not included. Now it is (:issue:`46754`)
- Bug in :meth:`DataFrame.rolling` gives ValueError when center=True, axis=1 and win_type is specified (:issue:`46135`)
- Bug in :meth:`.DataFrameGroupBy.describe` and :meth:`.SeriesGroupBy.describe` produces inconsistent results for empty datasets (:issue:`41575`)
-

Reshaping
^^^^^^^^^
Expand Down
8 changes: 8 additions & 0 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2542,6 +2542,14 @@ def ohlc(self) -> DataFrame:
@doc(DataFrame.describe)
def describe(self, **kwargs):
with self._group_selection_context():
if len(self._selected_obj) == 0:
described = self._selected_obj.describe(**kwargs)
if self._selected_obj.ndim == 1:
result = described
else:
result = described.unstack()
return result.to_frame().T.iloc[:0]

result = self._python_apply_general(
lambda x: x.describe(**kwargs),
self._selected_obj,
Expand Down
25 changes: 25 additions & 0 deletions pandas/tests/groupby/test_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -1354,3 +1354,28 @@ def test_deprecate_numeric_only(
# Doesn't have numeric_only argument and fails on nuisance columns
with pytest.raises(TypeError, match=r"unsupported operand type"):
method(*args, **kwargs)


@pytest.mark.parametrize("dtype", [int, float, object])
@pytest.mark.parametrize(
"kwargs",
[
{"percentiles": [0.10, 0.20, 0.30], "include": "all", "exclude": None},
{"percentiles": [0.10, 0.20, 0.30], "include": None, "exclude": ["int"]},
{"percentiles": [0.10, 0.20, 0.30], "include": ["int"], "exclude": None},
],
)
def test_groupby_empty_dataset(dtype, kwargs):
# GH#41575
df = DataFrame([[1, 2, 3]], columns=["A", "B", "C"], dtype=dtype)
df["B"] = df["B"].astype(int)
df["C"] = df["C"].astype(float)

result = df.iloc[:0].groupby("A").describe(**kwargs)
expected = df.groupby("A").describe(**kwargs).reset_index(drop=True).iloc[:0]
tm.assert_frame_equal(result, expected)

result = df.iloc[:0].groupby("A").B.describe(**kwargs)
expected = df.groupby("A").B.describe(**kwargs).reset_index(drop=True).iloc[:0]
expected.index = Index([])
tm.assert_frame_equal(result, expected)