-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fix 'observed' kwarg not doing anything on SeriesGroupBy #26463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
a5d6d1a
Fix 'observed' kwarg not doing anything on SeriesGroupBy
krsnik93 41f49f4
Merge branch 'GH24880'
krsnik93 2575c41
Wrap long lines
krsnik93 1c02d9f
Move tests to test_categorical.py
krsnik93 7350472
Merge remote-tracking branch 'upstream/master'
krsnik93 0a949d5
Merge branch 'master' into GH24880
krsnik93 0e9f473
Parameterized tests for 'observed' kwarg on SeriesGroupBy
krsnik93 1ef54f4
Merge remote-tracking branch 'upstream/master' into GH24880
krsnik93 cd481ad
Split test_groupby_series_observed to utilize fixtures better;Sort im…
krsnik93 a515caf
Sort imports in core/groupby/groupby.py
krsnik93 ff42dd7
Remove too specific fixtures and adjust tests
krsnik93 c22875c
Merge remote-tracking branch 'upstream/master' into GH24880
krsnik93 cc0b725
Use literal values for indices in tests
krsnik93 629a144
Merge remote-tracking branch 'upstream/master' into GH24880
krsnik93 e4fda22
Use MultiIndex.from_* to construct indices in tests
krsnik93 8cfa4a1
Wrap long lines
krsnik93 db176de
Merge remote-tracking branch 'upstream/master' into GH24880
krsnik93 d520952
Enhance docstring for _reindex_output
krsnik93 3591dbc
Modify tests to reuse existing fixture
krsnik93 f97c8a1
Merge remote-tracking branch 'upstream/master' into GH24880
krsnik93 d5c9c40
Refactor tests from a class to stand-alone functions
krsnik93 ad16db8
Simplify a test, add a docstring for the fixture and drop pd.* prefix…
krsnik93 7c525a1
Merge remote-tracking branch 'upstream/master' into GH24880
krsnik93 e6bca5e
Merge remote-tracking branch 'upstream/master' into GH24880
krsnik93 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ class providing the base-class of operations. | |
|
||
import numpy as np | ||
|
||
from pandas.core.arrays import Categorical | ||
from pandas._config.config import option_context | ||
|
||
from pandas._libs import Timestamp | ||
|
@@ -42,7 +43,7 @@ class providing the base-class of operations. | |
from pandas.core.frame import DataFrame | ||
from pandas.core.generic import NDFrame | ||
from pandas.core.groupby import base | ||
from pandas.core.index import Index, MultiIndex | ||
from pandas.core.index import Index, CategoricalIndex, MultiIndex | ||
from pandas.core.series import Series | ||
from pandas.core.sorting import get_group_index_sorter | ||
|
||
|
@@ -2301,6 +2302,69 @@ def tail(self, n=5): | |
mask = self._cumcount_array(ascending=False) < n | ||
return self._selected_obj[mask] | ||
|
||
def _reindex_output(self, result): | ||
""" | ||
If we have categorical groupers, then we want to make sure that | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you update the doc-string with Parameters / Results; type things if you can |
||
we have a fully reindex-output to the levels. These may have not | ||
participated in the groupings (e.g. may have all been | ||
nan groups); | ||
|
||
This can re-expand the output space | ||
""" | ||
|
||
# we need to re-expand the output space to accomodate all values | ||
# whether observed or not in the cartesian product of our groupes | ||
groupings = self.grouper.groupings | ||
if groupings is None: | ||
return result | ||
elif len(groupings) == 1: | ||
return result | ||
|
||
# if we only care about the observed values | ||
# we are done | ||
elif self.observed: | ||
return result | ||
|
||
# reindexing only applies to a Categorical grouper | ||
elif not any(isinstance(ping.grouper, (Categorical, CategoricalIndex)) | ||
for ping in groupings): | ||
return result | ||
|
||
levels_list = [ping.group_index for ping in groupings] | ||
index, _ = MultiIndex.from_product( | ||
levels_list, names=self.grouper.names).sortlevel() | ||
|
||
if self.as_index: | ||
d = {self.obj._get_axis_name(self.axis): index, 'copy': False} | ||
return result.reindex(**d) | ||
|
||
# GH 13204 | ||
# Here, the categorical in-axis groupers, which need to be fully | ||
# expanded, are columns in `result`. An idea is to do: | ||
# result = result.set_index(self.grouper.names) | ||
# .reindex(index).reset_index() | ||
# but special care has to be taken because of possible not-in-axis | ||
# groupers. | ||
# So, we manually select and drop the in-axis grouper columns, | ||
# reindex `result`, and then reset the in-axis grouper columns. | ||
|
||
# Select in-axis groupers | ||
in_axis_grps = ((i, ping.name) for (i, ping) | ||
in enumerate(groupings) if ping.in_axis) | ||
g_nums, g_names = zip(*in_axis_grps) | ||
|
||
result = result.drop(labels=list(g_names), axis=1) | ||
|
||
# Set a temp index and reindex (possibly expanding) | ||
result = result.set_index(self.grouper.result_index | ||
).reindex(index, copy=False) | ||
|
||
# Reset in-axis grouper columns | ||
# (using level numbers `g_nums` because level names may not be unique) | ||
result = result.reset_index(level=g_nums) | ||
|
||
return result.reset_index(drop=True) | ||
|
||
|
||
GroupBy._add_numeric_operations() | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.