Skip to content

API: User-control of result keys in GroupBy.apply #34998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 110 commits into from
Mar 30, 2022
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
0fa2104
API: User-control of result keys
TomAugspurger Jun 25, 2020
13a38a2
wip
TomAugspurger Jun 26, 2020
ebb2a2d
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jun 26, 2020
f8d646f
mmm
TomAugspurger Jun 26, 2020
623526c
updates
TomAugspurger Jun 26, 2020
6871ed0
fixups
TomAugspurger Jun 26, 2020
00ce5dc
test fixups
TomAugspurger Jun 29, 2020
c28f176
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jun 29, 2020
c0b1140
update doctests
TomAugspurger Jun 29, 2020
cfd4d73
resample
TomAugspurger Jun 29, 2020
c05b1ea
warning
TomAugspurger Jun 29, 2020
919a4c7
remove debug
TomAugspurger Jun 29, 2020
4a45ea0
warning
TomAugspurger Jun 29, 2020
7f1478f
warning
TomAugspurger Jun 29, 2020
a1d4da8
warning
TomAugspurger Jun 29, 2020
9c229c6
wip
TomAugspurger Jun 29, 2020
f1a570b
wip
TomAugspurger Jun 30, 2020
7520dd3
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jun 30, 2020
914c7cf
more
TomAugspurger Jun 30, 2020
7fd1a07
Add resample tests
TomAugspurger Jul 1, 2020
29f79ce
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jul 1, 2020
76e6873
extension
TomAugspurger Jul 1, 2020
7f5cd0d
fixups
TomAugspurger Jul 1, 2020
d21dfb8
lint
TomAugspurger Jul 1, 2020
6e253c3
fix doc warning
TomAugspurger Jul 1, 2020
8efe632
lint
TomAugspurger Jul 1, 2020
bda914e
whatsnew
TomAugspurger Jul 1, 2020
3789fd7
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jul 2, 2020
80c789e
ignore for now
TomAugspurger Jul 2, 2020
c4f6e2d
avoid mutating
TomAugspurger Jul 2, 2020
d123a80
comment on override_group_keys
TomAugspurger Jul 2, 2020
9cb58a3
fixups
TomAugspurger Jul 3, 2020
3421402
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jul 6, 2020
2e59629
typing
TomAugspurger Jul 6, 2020
8fd0de5
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jul 6, 2020
13965b2
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jul 6, 2020
bfb854e
fixups
TomAugspurger Jul 6, 2020
244fec6
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jul 6, 2020
7469b29
mypy
TomAugspurger Jul 6, 2020
16e0f5f
mypy
TomAugspurger Jul 6, 2020
4173a32
lint
TomAugspurger Jul 6, 2020
cd112cd
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jul 7, 2020
d984e9e
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jul 12, 2020
b6af0da
fixup
TomAugspurger Jul 12, 2020
8e46a6e
change ref
TomAugspurger Jul 12, 2020
d745c4a
fixup whatsnew
TomAugspurger Jul 12, 2020
23caf8e
fixup whatsnew
TomAugspurger Jul 12, 2020
b9b2e53
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jul 13, 2020
3d4a744
doc
TomAugspurger Jul 13, 2020
4d984d9
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Jul 14, 2020
5d12ba4
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Aug 6, 2020
20d8520
remove xpass
TomAugspurger Aug 6, 2020
cb9217a
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Aug 7, 2020
7cf051a
Fixup
TomAugspurger Aug 7, 2020
e0972fc
Merge remote-tracking branch 'upstream/master' into 34809-result-type
TomAugspurger Sep 4, 2020
ad0f292
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
rhshadrach Jan 2, 2021
b3c8d53
merge cleanup and pass tests
rhshadrach Jan 2, 2021
08ae03f
Move docs to 1.3, avoid warnings in tests
rhshadrach Jan 3, 2021
f055193
Revert group_keys=False when apply is not used
rhshadrach Jan 3, 2021
d4f24a6
Merge branch 'master' of https://github.com/pandas-dev/pandas into 34…
rhshadrach Jan 3, 2021
c1953e6
Avoid warnings when using apply internally, bare xfail/FutureWarning …
rhshadrach Jan 3, 2021
47dec36
Merge branch 'master' of https://github.com/pandas-dev/pandas into 34…
rhshadrach Jan 3, 2021
74b98e1
Remove group_keys from DataFrame.resample, parametrize one test
rhshadrach Jan 3, 2021
a547d63
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
rhshadrach Jan 4, 2021
20d7663
Doc changes
rhshadrach Jan 4, 2021
c3bccb3
Remove group_keys from NDFrame.resample
rhshadrach Jan 4, 2021
7f7ee52
Remove group_keys from Series.resample
rhshadrach Jan 4, 2021
4f652a9
Fixed _is_indexed_like bug
rhshadrach Jan 4, 2021
499ecd4
Minor fixups
rhshadrach Jan 4, 2021
d1f2d29
whatsnew
rhshadrach Jan 4, 2021
45cd980
Restored group_keys in resample, defaults to no_default
rhshadrach Jan 9, 2021
c3f8258
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
rhshadrach Jan 9, 2021
804312a
Fixed stacklevel for resample warning, simplified test
rhshadrach Jan 9, 2021
a7d5a0e
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
rhshadrach Jan 12, 2021
a5ce219
whatsnew and docstring for DataFrame.resample
rhshadrach Jan 12, 2021
15b1556
Revert accidental changes
rhshadrach Jan 12, 2021
d3de668
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
rhshadrach Jan 26, 2021
f80c4c0
Removed unnecessary group_keys in tests
rhshadrach Jan 26, 2021
62c42e8
Removed unnecessary xfail, testing equals instead of is
rhshadrach Jan 26, 2021
bf4b126
Revert is -> eqauls change
rhshadrach Jan 27, 2021
565791a
Merge branch 'master' of https://github.com/pandas-dev/pandas into 34…
rhshadrach Nov 13, 2021
2b02b6f
fixups
rhshadrach Nov 14, 2021
3e175f9
Test fixup
rhshadrach Nov 14, 2021
7f3cc48
Update verison to 1.4.0, use find_stack_level
rhshadrach Nov 14, 2021
0c0b2c5
Cleanups
rhshadrach Nov 14, 2021
f46c59a
Added test
rhshadrach Nov 14, 2021
27ed908
type-hint fixups
rhshadrach Nov 14, 2021
1200c55
Merge branch 'master' of https://github.com/pandas-dev/pandas into 34…
rhshadrach Nov 25, 2021
d4a36b7
Merge branch '34809-result-type' of https://github.com/TomAugspurger/…
rhshadrach Nov 25, 2021
c84fa45
Doc fixups
rhshadrach Dec 5, 2021
2912d88
Merge branch '34809-result-type' of https://github.com/TomAugspurger/…
rhshadrach Dec 5, 2021
f3d94a8
Merge branch 'master' of https://github.com/pandas-dev/pandas into 34…
rhshadrach Dec 5, 2021
1bd1e0e
Merge branch 'master' of https://github.com/pandas-dev/pandas into 34…
rhshadrach Jan 1, 2022
39d54fc
Merge main
rhshadrach Jan 22, 2022
808efc4
Move notes from 1.4 to 1.5, added deprecation note for .groupby(...).…
rhshadrach Jan 22, 2022
215d9a8
Avoid warnings, is_empty_agg -> is_agg
rhshadrach Jan 22, 2022
0c8274c
Merge branch 'main' of https://github.com/pandas-dev/pandas into 3480…
rhshadrach Jan 26, 2022
c312e90
Merge branch 'main' of https://github.com/pandas-dev/pandas into 3480…
rhshadrach Jan 31, 2022
30d0f23
Merge branch 'main' of https://github.com/pandas-dev/pandas into 3480…
rhshadrach Feb 5, 2022
1c01624
Merge branch 'main' of https://github.com/pandas-dev/pandas into 3480…
rhshadrach Feb 8, 2022
14b9b0f
Suppress warning in tests
rhshadrach Feb 8, 2022
09da38e
Merge branch 'main' of https://github.com/pandas-dev/pandas into 3480…
rhshadrach Feb 13, 2022
d771ae1
Merge branch 'main' of https://github.com/pandas-dev/pandas into 3480…
rhshadrach Feb 20, 2022
cb4bed6
Merge branch 'main' of https://github.com/pandas-dev/pandas into 3480…
rhshadrach Feb 27, 2022
4ea1550
Merge branch 'main' of https://github.com/pandas-dev/pandas into 3480…
rhshadrach Mar 15, 2022
eace964
Merge branch '34809-result-type' of https://github.com/TomAugspurger/…
rhshadrach Mar 15, 2022
6407abd
Merge branch 'main' of https://github.com/pandas-dev/pandas into 3480…
rhshadrach Mar 22, 2022
deb5479
Merge branch '34809-result-type' of https://github.com/TomAugspurger/…
rhshadrach Mar 22, 2022
a0fd04c
Merge cleanup
rhshadrach Mar 23, 2022
f9aa547
Merge branch 'main' of https://github.com/pandas-dev/pandas into 3480…
rhshadrach Mar 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/user_guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1019,7 +1019,7 @@ The dimension of the returned result can also change:

.. ipython::

In [8]: grouped = df.groupby('A')['C']
In [8]: grouped = df.groupby('A', group_keys=False)['C']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not executed code? why do we have the ipython prompts here


In [10]: def f(group):
....: return pd.DataFrame({'original': group,
Expand Down
11 changes: 8 additions & 3 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -342,10 +342,15 @@ Now every group is evaluated only a single time.

*New behavior*:

.. ipython:: python

df.groupby("a").apply(func)
.. code-block:: python

In [3]: df.groupby('a').apply(func)
x
y
Out[3]:
a b
0 x 1
1 y 2

Concatenating sparse values
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
39 changes: 39 additions & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -780,6 +780,45 @@ Development Changes
Deprecations
~~~~~~~~~~~~

:meth:`~DataFrame.groupby` no longer ignores ``group_keys`` for transform-like ``apply``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:`~DataFrame.groupby` will no longer ignore the ``group_keys`` argument for functions passed to ``apply`` that return like-indexed outputs (:issue:`34809`).
Previous versions of pandas would add the group keys only when the result from the applied function had a different index to the input.

.. code-block:: python

>>> # pandas 1.0.4
>>> df = pd.DataFrame({"A": [1, 2, 2], "B": [1, 2, 3]})
>>> df
A B
0 1 1
1 2 2
2 2 3
>>> df.groupby("A").apply(lambda x: x.rename(np.exp)) # Different index
A B
A
1 1.000000 1 1
2 2.718282 2 2
7.389056 2 3

>>> df.groupby("A").apply(lambda x: x) # Same index
A B
0 1 1
1 2 2
2 2 3

In this future this behavior will change to always respect ``as_index``, which defaults to True.

.. ipython:: python
:okwarning:

df = pd.DataFrame({"A": [1, 2, 2], "B": [1, 2, 3]})
df.groupby("A").apply(lambda x: x)

Other Deprecations
^^^^^^^^^^^^^^^^^^

- Lookups on a :class:`Series` with a single-item list containing a slice (e.g. ``ser[[slice(0, 4)]]``) are deprecated, will raise in a future version. Either convert the list to tuple, or pass the slice directly instead (:issue:`31333`)

- :meth:`DataFrame.mean` and :meth:`DataFrame.median` with ``numeric_only=None`` will include datetime64 and datetime64tz columns in a future version (:issue:`29941`)
Expand Down
11 changes: 10 additions & 1 deletion pandas/_libs/reduction.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,16 @@ def apply_frame_axis0(object frame, object f, object names,
mutated = True
except AttributeError:
# `piece` might not have an index, could be e.g. an int
pass
# By definition, we are not a transform, so set mutated
# to True
mutated = True
if not mutated:
# Also check if the columns are mutated
try:
if not piece.columns.equals(chunk.columns):
mutated = True
except AttributeError:
mutated = True

if not is_scalar(piece):
# Need to copy data to avoid appending references
Expand Down
22 changes: 21 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -6420,6 +6420,26 @@ def update(
a 13.0 13.0
b 12.3 123.0
NaN 12.3 33.0

To exclude or include the group keys in the index, specify ``group_keys``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for apply, while this is the general groupby docstring?


>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
... 'Parrot', 'Parrot'],
... 'Max Speed': [380., 370., 24., 26.]})
>>> df.groupby("Animal", group_keys=False).apply(lambda x: x)
Animal Max Speed
0 Falcon 380.0
1 Falcon 370.0
2 Parrot 24.0
3 Parrot 26.0

>>> df.groupby("Animal", group_keys=True).apply(lambda x: x)
Animal Max Speed
Animal
Falcon 0 Falcon 380.0
1 Falcon 370.0
Parrot 2 Parrot 24.0
3 Parrot 26.0
"""
)
@Appender(_shared_docs["groupby"] % _shared_doc_kwargs)
Expand All @@ -6430,7 +6450,7 @@ def groupby(
level=None,
as_index: bool = True,
sort: bool = True,
group_keys: bool = True,
group_keys: bool = no_default,
squeeze: bool = no_default,
observed: bool = False,
dropna: bool = True,
Expand Down
8 changes: 8 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -7692,6 +7692,7 @@ def resample(
level=None,
origin: Union[str, TimestampConvertibleTypes] = "start_day",
offset: Optional[TimedeltaConvertibleTypes] = None,
group_keys: bool_t = lib.no_default,
) -> "Resampler":
"""
Resample time-series data.
Expand Down Expand Up @@ -7761,6 +7762,12 @@ def resample(

.. versionadded:: 1.1.0

group_keys : bool, default True
Whether to include the group keys in the result index when performing
a ``.groupby().apply()`` to the resampled object.

.. versionadded:: 1.1.0

Returns
-------
Resampler object
Expand Down Expand Up @@ -8077,6 +8084,7 @@ def resample(
level=level,
origin=origin,
offset=offset,
group_keys=group_keys,
)

def first(self: FrameOrSeries, offset) -> FrameOrSeries:
Expand Down
65 changes: 54 additions & 11 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
import numpy as np

from pandas._libs import lib
from pandas._typing import FrameOrSeries
from pandas._typing import FrameOrSeries, FrameOrSeriesUnion
from pandas.util._decorators import Appender, Substitution, doc

from pandas.core.dtypes.cast import (
Expand Down Expand Up @@ -413,7 +413,14 @@ def _wrap_transformed_output(
assert isinstance(result, Series)
return result

def _wrap_applied_output(self, keys, values, not_indexed_same=False):
def _wrap_applied_output(
self,
keys,
values,
not_indexed_same: bool = False,
override_group_keys: bool = False,
) -> FrameOrSeriesUnion:
result: FrameOrSeriesUnion
if len(keys) == 0:
# GH #6265
return self.obj._constructor(
Expand All @@ -440,10 +447,20 @@ def _get_index() -> Index:
return result

if isinstance(values[0], Series):
return self._concat_objects(keys, values, not_indexed_same=not_indexed_same)
return self._concat_objects(
keys,
values,
not_indexed_same=not_indexed_same,
override_group_keys=override_group_keys,
)
elif isinstance(values[0], DataFrame):
# possible that Series -> DataFrame by applied function
return self._concat_objects(keys, values, not_indexed_same=not_indexed_same)
return self._concat_objects(
keys,
values,
not_indexed_same=not_indexed_same,
override_group_keys=override_group_keys,
)
else:
# GH #6265 #24880
result = self.obj._constructor(
Expand Down Expand Up @@ -1203,7 +1220,13 @@ def _aggregate_item_by_item(self, func, *args, **kwargs) -> DataFrame:

return self.obj._constructor(result, columns=result_columns)

def _wrap_applied_output(self, keys, values, not_indexed_same=False):
def _wrap_applied_output(
self,
keys,
values,
not_indexed_same: bool = False,
override_group_keys: bool = False,
) -> FrameOrSeriesUnion:
if len(keys) == 0:
return self.obj._constructor(index=keys)

Expand All @@ -1217,7 +1240,12 @@ def _wrap_applied_output(self, keys, values, not_indexed_same=False):
# We'd prefer it return an empty dataframe.
return self.obj._constructor()
elif isinstance(first_not_none, DataFrame):
return self._concat_objects(keys, values, not_indexed_same=not_indexed_same)
return self._concat_objects(
keys,
values,
not_indexed_same=not_indexed_same,
override_group_keys=override_group_keys,
)
else:
if len(self.grouper.groupings) > 1:
key_index = self.grouper.result_index
Expand Down Expand Up @@ -1247,14 +1275,16 @@ def _wrap_applied_output(self, keys, values, not_indexed_same=False):
# make Nones an empty object
if first_not_none is None:
return self.obj._constructor()
elif isinstance(first_not_none, NDFrame):
elif isinstance(first_not_none, (Series, DataFrame)):

# this is to silence a DeprecationWarning
# TODO: Remove when default dtype of empty Series is object
kwargs = first_not_none._construct_axes_dict()
backup: FrameOrSeriesUnion
if isinstance(first_not_none, Series):
kwargs["dtype_if_empty"] = object
backup = create_series_with_explicit_dtype(
**kwargs, dtype_if_empty=object
**kwargs,
)
else:
backup = first_not_none._constructor(**kwargs)
Expand Down Expand Up @@ -1284,7 +1314,10 @@ def _wrap_applied_output(self, keys, values, not_indexed_same=False):
# OR we don't have a multi-index and we have only a
# single values
return self._concat_objects(
keys, values, not_indexed_same=not_indexed_same
keys,
values,
not_indexed_same=not_indexed_same,
override_group_keys=override_group_keys,
)

# still a series
Expand All @@ -1296,7 +1329,12 @@ def _wrap_applied_output(self, keys, values, not_indexed_same=False):

if not all_indexed_same:
# GH 8467
return self._concat_objects(keys, values, not_indexed_same=True)
return self._concat_objects(
keys,
values,
not_indexed_same=True,
override_group_keys=override_group_keys,
)

if self.axis == 0 and isinstance(v, ABCSeries):
# GH6124 if the list of Series have a consistent name,
Expand Down Expand Up @@ -1668,12 +1706,17 @@ def _gotitem(self, key, ndim: int, subset=None):
exclusions=self.exclusions,
as_index=self.as_index,
observed=self.observed,
group_keys=self.group_keys,
)
elif ndim == 1:
if subset is None:
subset = self.obj[key]
return SeriesGroupBy(
subset, selection=key, grouper=self.grouper, observed=self.observed
subset,
selection=key,
grouper=self.grouper,
observed=self.observed,
group_keys=self.group_keys,
)

raise AssertionError("invalid ndim for _gotitem")
Expand Down
Loading