-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: groupby.describe on a frame with duplicate column names #50846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: groupby.describe on a frame with duplicate column names #50846
Conversation
…es_as_index_false
…es_as_index_false
…adrach/pandas into groupby_select_obj_dup_cols # Conflicts: # pandas/core/groupby/groupby.py
…adrach/pandas into groupby_select_obj_dup_cols # Conflicts: # pandas/core/groupby/groupby.py
…pby_select_obj_dup_cols
@mroeschke - what do you think about the conditional logic based on the number of columns here? |
…pby_select_obj_dup_cols
pandas/core/groupby/groupby.py
Outdated
@@ -726,7 +726,9 @@ def _selected_obj(self): | |||
|
|||
if self._selection is None or isinstance(self.obj, Series): | |||
if self._group_selection is not None: | |||
return self.obj[self._group_selection] | |||
return self.obj._take( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the same as _obj_with_exclusions, which should already be cached, so we could avoid making a copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great - not only that, but we can also avoid all the code that determines _group_selection
. I've turned _group_selection
into a Boolean flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we expect this change to affect the timings in the OP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASVs updated; essentially the same results.
…adrach/pandas into groupby_select_obj_dup_cols # Conflicts: # pandas/core/groupby/groupby.py
|
||
if groupby_func in ("size", "ngroup", "cumcount"): | ||
expected = getattr( | ||
df.take([0, 1], axis=1).groupby("a", as_index=as_index), groupby_func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: can you avoid chaining take/gropby/getattr here (and in L1639)? easier to grok if something goes wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…pby_select_obj_dup_cols
…pby_select_obj_dup_cols � Conflicts: � pandas/core/groupby/groupby.py
@mroeschke @jbrockmendel - friendly ping. |
…pby_select_obj_dup_cols # Conflicts: # pandas/core/groupby/groupby.py
…adrach/pandas into groupby_select_obj_dup_cols # Conflicts: # pandas/core/groupby/groupby.py
Thanks @rhshadrach |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.ASVs are below.