-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: add documentation to DataFrameGroupBy.skew and SeriesGroupBy.skew #50958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Some requests below. Most of the requests for the SeriesGroupBy version also apply to the DataFrameGroupBy version as well.
numeric_only : bool, default False | ||
Include only float, int, boolean columns. Not implemented for Series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numeric_only
is a parameter for the skew
method but it seems like it is not used. Should the documentation be updated? Or can numeric_only
maybe be removed and this section with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing numeric_only functioning here:
df = pd.DataFrame({'a': [1, 1, 2], 'b': list('xyz'), 'c': [3, 4, 5]})
gb = df.groupby('a')
result = gb.skew(numeric_only=True)
print(result)
# c
# a
# 1 NaN
# 2 NaN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can reproduce this. However, when I run this
ser = pd.Series([1, 2, 1, 'x', 3, 4, 3], index=['a', 'a', 'a', 'b', 'b', 'b', 'b'])
result = ser.groupby(level=0).skew(numeric_only=True)
I get the following error:
TypeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 result = ser.groupby(level=0).skew(numeric_only=True)
File ~/open-source/pandas-maren/pandas/core/groupby/generic.py:1063, in SeriesGroupBy.skew(self, axis, skipna, numeric_only, **kwargs)
1004 def skew(
1005 self,
1006 axis: Axis | lib.NoDefault = lib.no_default,
(...)
1009 **kwargs,
1010 ) -> Series:
1011 """
1012 Return unbiased skew within groups.
1013
(...)
1061 Name: Max Speed, dtype: float64
1062 """
-> 1063 result = self._op_via_apply(
1064 "skew",
1065 axis=axis,
1066 skipna=skipna,
1067 numeric_only=numeric_only,
1068 **kwargs,
1069 )
1070 return result
File ~/open-source/pandas-maren/pandas/core/groupby/groupby.py:987, in GroupBy._op_via_apply(self, name, *args, **kwargs)
985 if is_transform and self._obj_with_exclusions.empty:
986 return self._obj_with_exclusions
--> 987 result = self._python_apply_general(
988 curried,
989 self._obj_with_exclusions,
990 is_transform=is_transform,
991 not_indexed_same=not is_transform,
992 )
994 if self.grouper.has_dropped_na and is_transform:
995 # result will have dropped rows due to nans, fill with null
996 # and ensure index is ordered same as the input
997 result = self._set_result_index_ordered(result)
File ~/open-source/pandas-maren/pandas/core/groupby/groupby.py:1488, in GroupBy._python_apply_general(self, f, data, not_indexed_same, is_transform, is_agg)
1451 @final
1452 def _python_apply_general(
1453 self,
(...)
1458 is_agg: bool = False,
1459 ) -> NDFrameT:
1460 """
1461 Apply function f in python space
1462
(...)
1486 data after applying f
1487 """
-> 1488 values, mutated = self.grouper.apply(f, data, self.axis)
1489 if not_indexed_same is None:
1490 not_indexed_same = mutated or self.mutated
File ~/open-source/pandas-maren/pandas/core/groupby/ops.py:786, in BaseGrouper.apply(self, f, data, axis)
784 # group might be modified
785 group_axes = group.axes
--> 786 res = f(group)
787 if not mutated and not _is_indexed_like(res, group_axes, axis):
788 mutated = True
File ~/open-source/pandas-maren/pandas/core/groupby/groupby.py:972, in GroupBy._op_via_apply.<locals>.curried(x)
971 def curried(x):
--> 972 return f(x, *args, **kwargs)
File ~/open-source/pandas-maren/pandas/core/generic.py:11511, in NDFrame._add_numeric_operations.<locals>.skew(self, axis, skipna, numeric_only, **kwargs)
11494 @doc(
11495 _num_doc,
11496 desc="Return unbiased skew over requested axis.\n\nNormalized by N-1.",
(...)
11509 **kwargs,
11510 ):
> 11511 return NDFrame.skew(self, axis, skipna, numeric_only, **kwargs)
File ~/open-source/pandas-maren/pandas/core/generic.py:11157, in NDFrame.skew(self, axis, skipna, numeric_only, **kwargs)
11150 def skew(
11151 self,
11152 axis: Axis | None = 0,
(...)
11155 **kwargs,
11156 ) -> Series | float:
> 11157 return self._stat_function(
11158 "skew", nanops.nanskew, axis, skipna, numeric_only, **kwargs
11159 )
File ~/open-source/pandas-maren/pandas/core/generic.py:11092, in NDFrame._stat_function(self, name, func, axis, skipna, numeric_only, **kwargs)
11088 nv.validate_stat_func((), kwargs, fname=name)
11090 validate_bool_kwarg(skipna, "skipna", none_allowed=False)
> 11092 return self._reduce(
11093 func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
11094 )
File ~/open-source/pandas-maren/pandas/core/series.py:4628, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
4626 kwd_name = "bool_only"
4627 # GH#47500 - change to TypeError to match other methods
-> 4628 raise TypeError(
4629 f"Series.{name} does not allow {kwd_name}={numeric_only} "
4630 "with non-numeric dtypes."
4631 )
4632 with np.errstate(all="ignore"):
4633 return op(delegate, skipna=skipna, **kwds)
TypeError: Series.skew does not allow numeric_only=True with non-numeric dtypes.
When I set numeric_only=False
, I get TypeError: could not convert string to float: 'x'
which makes sense. So numeric_only
can't really be used for Series beause it results in an error either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - I didn't realize you were referring to Series here. It is true that operating on a non-numeric Series will always fail with skew, and as far as I know this is consistent across Series and SeriesGroupBy ops. Perhaps there is a better behavior or the argument should be removed, but I think that's for a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good!
pandas/core/groupby/generic.py
Outdated
---------- | ||
axis : {0 or 'index', 1 or 'columns', None}, default 0 | ||
Axis for the function to be applied on. | ||
For `Series` this parameter is unused and defaults to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reST, there should be two backticks, ``Series``
(markdown would only have one). But this phrasing is used when the docstring is shared between Series and DataFrame implementations. Since this is just for Series, what do you think about something like:
This parameter is only for compatibility with DataFrame and is unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to write the same for numeric_only
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think numeric_only is slightly different - if you specify numeric_only=True
and the Series is non-numeric, then it will raise. This could be useful to identify errors - similar to adding an assert.
I made the requested changes. I will work on the CI failures tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @marenwestermann - great work! |
Thx @marenwestermann and thx @rhshadrach for reviewing |
Thank you and thanks for reviewing! |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Contributes to noatamir/pyladies-berlin-sprints#13
I copied the documentation from here and here. Let me know if I should delete the lines in SeriesGroupBy.skew that apply to DataFrames and vice versa.
ping @phofl @noatamir
#PyLadies Berlin sprint