DOC: add documentation to DataFrameGroupBy.skew and SeriesGroupBy.skew #50958

marenwestermann · 2023-01-24T15:37:41Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Contributes to noatamir/pyladies-berlin-sprints#13

I copied the documentation from here and here. Let me know if I should delete the lines in SeriesGroupBy.skew that apply to DataFrames and vice versa.

ping @phofl @noatamir

#PyLadies Berlin sprint

rhshadrach

Thanks for the PR! Some requests below. Most of the requests for the SeriesGroupBy version also apply to the DataFrameGroupBy version as well.

pandas/core/groupby/generic.py

marenwestermann · 2023-02-01T18:19:17Z

pandas/core/groupby/generic.py

+        numeric_only : bool, default False
+            Include only float, int, boolean columns. Not implemented for Series.


numeric_only is a parameter for the skew method but it seems like it is not used. Should the documentation be updated? Or can numeric_only maybe be removed and this section with it?

I'm seeing numeric_only functioning here:

df = pd.DataFrame({'a': [1, 1, 2], 'b': list('xyz'), 'c': [3, 4, 5]}) gb = df.groupby('a') result = gb.skew(numeric_only=True) print(result) # c # a # 1 NaN # 2 NaN

I can reproduce this. However, when I run this

ser = pd.Series([1, 2, 1, 'x', 3, 4, 3], index=['a', 'a', 'a', 'b', 'b', 'b', 'b']) result = ser.groupby(level=0).skew(numeric_only=True)

I get the following error:

TypeError Traceback (most recent call last) Cell In[5], line 1 ----> 1 result = ser.groupby(level=0).skew(numeric_only=True) File ~/open-source/pandas-maren/pandas/core/groupby/generic.py:1063, in SeriesGroupBy.skew(self, axis, skipna, numeric_only, **kwargs) 1004 def skew( 1005 self, 1006 axis: Axis | lib.NoDefault = lib.no_default, (...) 1009 **kwargs, 1010 ) -> Series: 1011 """ 1012 Return unbiased skew within groups. 1013 (...) 1061 Name: Max Speed, dtype: float64 1062 """ -> 1063 result = self._op_via_apply( 1064 "skew", 1065 axis=axis, 1066 skipna=skipna, 1067 numeric_only=numeric_only, 1068 **kwargs, 1069 ) 1070 return result File ~/open-source/pandas-maren/pandas/core/groupby/groupby.py:987, in GroupBy._op_via_apply(self, name, *args, **kwargs) 985 if is_transform and self._obj_with_exclusions.empty: 986 return self._obj_with_exclusions --> 987 result = self._python_apply_general( 988 curried, 989 self._obj_with_exclusions, 990 is_transform=is_transform, 991 not_indexed_same=not is_transform, 992 ) 994 if self.grouper.has_dropped_na and is_transform: 995 # result will have dropped rows due to nans, fill with null 996 # and ensure index is ordered same as the input 997 result = self._set_result_index_ordered(result) File ~/open-source/pandas-maren/pandas/core/groupby/groupby.py:1488, in GroupBy._python_apply_general(self, f, data, not_indexed_same, is_transform, is_agg) 1451 @final 1452 def _python_apply_general( 1453 self, (...) 1458 is_agg: bool = False, 1459 ) -> NDFrameT: 1460 """ 1461 Apply function f in python space 1462 (...) 1486 data after applying f 1487 """ -> 1488 values, mutated = self.grouper.apply(f, data, self.axis) 1489 if not_indexed_same is None: 1490 not_indexed_same = mutated or self.mutated File ~/open-source/pandas-maren/pandas/core/groupby/ops.py:786, in BaseGrouper.apply(self, f, data, axis) 784 # group might be modified 785 group_axes = group.axes --> 786 res = f(group) 787 if not mutated and not _is_indexed_like(res, group_axes, axis): 788 mutated = True File ~/open-source/pandas-maren/pandas/core/groupby/groupby.py:972, in GroupBy._op_via_apply.<locals>.curried(x) 971 def curried(x): --> 972 return f(x, *args, **kwargs) File ~/open-source/pandas-maren/pandas/core/generic.py:11511, in NDFrame._add_numeric_operations.<locals>.skew(self, axis, skipna, numeric_only, **kwargs) 11494 @doc( 11495 _num_doc, 11496 desc="Return unbiased skew over requested axis.\n\nNormalized by N-1.", (...) 11509 **kwargs, 11510 ): > 11511 return NDFrame.skew(self, axis, skipna, numeric_only, **kwargs) File ~/open-source/pandas-maren/pandas/core/generic.py:11157, in NDFrame.skew(self, axis, skipna, numeric_only, **kwargs) 11150 def skew( 11151 self, 11152 axis: Axis | None = 0, (...) 11155 **kwargs, 11156 ) -> Series | float: > 11157 return self._stat_function( 11158 "skew", nanops.nanskew, axis, skipna, numeric_only, **kwargs 11159 ) File ~/open-source/pandas-maren/pandas/core/generic.py:11092, in NDFrame._stat_function(self, name, func, axis, skipna, numeric_only, **kwargs) 11088 nv.validate_stat_func((), kwargs, fname=name) 11090 validate_bool_kwarg(skipna, "skipna", none_allowed=False) > 11092 return self._reduce( 11093 func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only 11094 ) File ~/open-source/pandas-maren/pandas/core/series.py:4628, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds) 4626 kwd_name = "bool_only" 4627 # GH#47500 - change to TypeError to match other methods -> 4628 raise TypeError( 4629 f"Series.{name} does not allow {kwd_name}={numeric_only} " 4630 "with non-numeric dtypes." 4631 ) 4632 with np.errstate(all="ignore"): 4633 return op(delegate, skipna=skipna, **kwds) TypeError: Series.skew does not allow numeric_only=True with non-numeric dtypes.

When I set numeric_only=False, I get TypeError: could not convert string to float: 'x' which makes sense. So numeric_only can't really be used for Series beause it results in an error either way.

Ah - I didn't realize you were referring to Series here. It is true that operating on a non-numeric Series will always fail with skew, and as far as I know this is consistent across Series and SeriesGroupBy ops. Perhaps there is a better behavior or the argument should be removed, but I think that's for a separate issue.

rhshadrach

This is looking really good!

rhshadrach · 2023-02-01T22:19:33Z

pandas/core/groupby/generic.py

+        ----------
+        axis : {0 or 'index', 1 or 'columns', None}, default 0
+            Axis for the function to be applied on.
+            For `Series` this parameter is unused and defaults to 0.


For reST, there should be two backticks, ``Series`` (markdown would only have one). But this phrasing is used when the docstring is shared between Series and DataFrame implementations. Since this is just for Series, what do you think about something like:

This parameter is only for compatibility with DataFrame and is unused.

Maybe it makes sense to write the same for numeric_only?

I think numeric_only is slightly different - if you specify numeric_only=True and the Series is non-numeric, then it will raise. This could be useful to identify errors - similar to adding an assert.

pandas/core/groupby/generic.py

marenwestermann · 2023-02-02T20:59:11Z

I made the requested changes. I will work on the CI failures tomorrow.

rhshadrach

lgtm

rhshadrach · 2023-02-05T13:22:03Z

Thanks @marenwestermann - great work!

phofl · 2023-02-05T15:17:25Z

Thx @marenwestermann and thx @rhshadrach for reviewing

marenwestermann · 2023-02-06T20:25:27Z

Thank you and thanks for reviewing!

add documentation

42cb9dd

marenwestermann changed the title ~~DOC add documentation to DataFrameGroupBy.skew and SeriesGroupBy.skew~~ DOC: add documentation to DataFrameGroupBy.skew and SeriesGroupBy.skew Jan 24, 2023

mroeschke added the Docs label Jan 24, 2023

mroeschke requested a review from phofl January 24, 2023 18:41

rhshadrach added the Groupby label Jan 24, 2023

rhshadrach requested changes Jan 24, 2023

View reviewed changes

apply requested changes

032fa5f

marenwestermann commented Feb 1, 2023

View reviewed changes

Maren Westermann added 2 commits February 1, 2023 19:27

correction

97823a6

fix line too long error

7ace24e

rhshadrach requested changes Feb 1, 2023

View reviewed changes

marenwestermann mentioned this pull request Feb 2, 2023

BUG: axis=None when given as a parameter of skew behaves differently when used in combination with groupy #51130

Open

3 tasks

apply requested changes vol2

3e1bc24

Maren Westermann added 2 commits February 3, 2023 21:29

address ci failures

2a413ac

Merge remote-tracking branch 'upstream/main' into groupby-skew

1701d1f

rhshadrach approved these changes Feb 5, 2023

View reviewed changes

rhshadrach added this to the 2.0 milestone Feb 5, 2023

rhshadrach mentioned this pull request Feb 5, 2023

DOC: Add groupby-specific docstrings to various methods noatamir/pyladies-berlin-sprints#13

Open

24 tasks

rhshadrach merged commit 1a48cd9 into pandas-dev:main Feb 5, 2023

marenwestermann deleted the groupby-skew branch February 6, 2023 20:25

		numeric_only : bool, default False
		Include only float, int, boolean columns. Not implemented for Series.

Uh oh!

DOC: add documentation to DataFrameGroupBy.skew and SeriesGroupBy.skew #50958

DOC: add documentation to DataFrameGroupBy.skew and SeriesGroupBy.skew #50958

Uh oh!

Conversation

marenwestermann commented Jan 24, 2023

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marenwestermann Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhshadrach Feb 1, 2023

Choose a reason for hiding this comment

Uh oh!

marenwestermann Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

rhshadrach Feb 3, 2023

Choose a reason for hiding this comment

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marenwestermann Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

rhshadrach Feb 5, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marenwestermann commented Feb 2, 2023

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach commented Feb 5, 2023

Uh oh!

phofl commented Feb 5, 2023

Uh oh!

marenwestermann commented Feb 6, 2023

Uh oh!

Uh oh!

marenwestermann Feb 1, 2023 •

edited

Loading

rhshadrach Feb 1, 2023 •

edited

Loading