Skip to content

DOC: add documentation to DataFrameGroupBy.skew and SeriesGroupBy.skew #50958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 5, 2023

Conversation

marenwestermann
Copy link
Contributor

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Contributes to noatamir/pyladies-berlin-sprints#13

I copied the documentation from here and here. Let me know if I should delete the lines in SeriesGroupBy.skew that apply to DataFrames and vice versa.

ping @phofl @noatamir

#PyLadies Berlin sprint

@marenwestermann marenwestermann changed the title DOC add documentation to DataFrameGroupBy.skew and SeriesGroupBy.skew DOC: add documentation to DataFrameGroupBy.skew and SeriesGroupBy.skew Jan 24, 2023
@mroeschke mroeschke added the Docs label Jan 24, 2023
@mroeschke mroeschke requested a review from phofl January 24, 2023 18:41
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Some requests below. Most of the requests for the SeriesGroupBy version also apply to the DataFrameGroupBy version as well.

Comment on lines +1028 to +1029
numeric_only : bool, default False
Include only float, int, boolean columns. Not implemented for Series.
Copy link
Contributor Author

@marenwestermann marenwestermann Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numeric_only is a parameter for the skew method but it seems like it is not used. Should the documentation be updated? Or can numeric_only maybe be removed and this section with it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing numeric_only functioning here:

df = pd.DataFrame({'a': [1, 1, 2], 'b': list('xyz'), 'c': [3, 4, 5]})
gb = df.groupby('a')
result = gb.skew(numeric_only=True)
print(result)

#     c
# a    
# 1 NaN
# 2 NaN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce this. However, when I run this

ser = pd.Series([1, 2, 1, 'x', 3, 4, 3], index=['a', 'a', 'a', 'b', 'b', 'b', 'b'])
result = ser.groupby(level=0).skew(numeric_only=True)

I get the following error:

TypeError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 result = ser.groupby(level=0).skew(numeric_only=True)

File ~/open-source/pandas-maren/pandas/core/groupby/generic.py:1063, in SeriesGroupBy.skew(self, axis, skipna, numeric_only, **kwargs)
   1004 def skew(
   1005     self,
   1006     axis: Axis | lib.NoDefault = lib.no_default,
   (...)
   1009     **kwargs,
   1010 ) -> Series:
   1011     """
   1012     Return unbiased skew within groups.
   1013
   (...)
   1061     Name: Max Speed, dtype: float64
   1062     """
-> 1063     result = self._op_via_apply(
   1064         "skew",
   1065         axis=axis,
   1066         skipna=skipna,
   1067         numeric_only=numeric_only,
   1068         **kwargs,
   1069     )
   1070     return result

File ~/open-source/pandas-maren/pandas/core/groupby/groupby.py:987, in GroupBy._op_via_apply(self, name, *args, **kwargs)
    985 if is_transform and self._obj_with_exclusions.empty:
    986     return self._obj_with_exclusions
--> 987 result = self._python_apply_general(
    988     curried,
    989     self._obj_with_exclusions,
    990     is_transform=is_transform,
    991     not_indexed_same=not is_transform,
    992 )
    994 if self.grouper.has_dropped_na and is_transform:
    995     # result will have dropped rows due to nans, fill with null
    996     # and ensure index is ordered same as the input
    997     result = self._set_result_index_ordered(result)

File ~/open-source/pandas-maren/pandas/core/groupby/groupby.py:1488, in GroupBy._python_apply_general(self, f, data, not_indexed_same, is_transform, is_agg)
   1451 @final
   1452 def _python_apply_general(
   1453     self,
   (...)
   1458     is_agg: bool = False,
   1459 ) -> NDFrameT:
   1460     """
   1461     Apply function f in python space
   1462
   (...)
   1486         data after applying f
   1487     """
-> 1488     values, mutated = self.grouper.apply(f, data, self.axis)
   1489     if not_indexed_same is None:
   1490         not_indexed_same = mutated or self.mutated

File ~/open-source/pandas-maren/pandas/core/groupby/ops.py:786, in BaseGrouper.apply(self, f, data, axis)
    784 # group might be modified
    785 group_axes = group.axes
--> 786 res = f(group)
    787 if not mutated and not _is_indexed_like(res, group_axes, axis):
    788     mutated = True

File ~/open-source/pandas-maren/pandas/core/groupby/groupby.py:972, in GroupBy._op_via_apply.<locals>.curried(x)
    971 def curried(x):
--> 972     return f(x, *args, **kwargs)

File ~/open-source/pandas-maren/pandas/core/generic.py:11511, in NDFrame._add_numeric_operations.<locals>.skew(self, axis, skipna, numeric_only, **kwargs)
  11494 @doc(
  11495     _num_doc,
  11496     desc="Return unbiased skew over requested axis.\n\nNormalized by N-1.",
   (...)
  11509     **kwargs,
  11510 ):
> 11511     return NDFrame.skew(self, axis, skipna, numeric_only, **kwargs)

File ~/open-source/pandas-maren/pandas/core/generic.py:11157, in NDFrame.skew(self, axis, skipna, numeric_only, **kwargs)
  11150 def skew(
  11151     self,
  11152     axis: Axis | None = 0,
   (...)
  11155     **kwargs,
  11156 ) -> Series | float:
> 11157     return self._stat_function(
  11158         "skew", nanops.nanskew, axis, skipna, numeric_only, **kwargs
  11159     )

File ~/open-source/pandas-maren/pandas/core/generic.py:11092, in NDFrame._stat_function(self, name, func, axis, skipna, numeric_only, **kwargs)
  11088     nv.validate_stat_func((), kwargs, fname=name)
  11090 validate_bool_kwarg(skipna, "skipna", none_allowed=False)
> 11092 return self._reduce(
  11093     func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
  11094 )

File ~/open-source/pandas-maren/pandas/core/series.py:4628, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   4626         kwd_name = "bool_only"
   4627     # GH#47500 - change to TypeError to match other methods
-> 4628     raise TypeError(
   4629         f"Series.{name} does not allow {kwd_name}={numeric_only} "
   4630         "with non-numeric dtypes."
   4631     )
   4632 with np.errstate(all="ignore"):
   4633     return op(delegate, skipna=skipna, **kwds)

TypeError: Series.skew does not allow numeric_only=True with non-numeric dtypes.

When I set numeric_only=False, I get TypeError: could not convert string to float: 'x' which makes sense. So numeric_only can't really be used for Series beause it results in an error either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I didn't realize you were referring to Series here. It is true that operating on a non-numeric Series will always fail with skew, and as far as I know this is consistent across Series and SeriesGroupBy ops. Perhaps there is a better behavior or the argument should be removed, but I think that's for a separate issue.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good!

----------
axis : {0 or 'index', 1 or 'columns', None}, default 0
Axis for the function to be applied on.
For `Series` this parameter is unused and defaults to 0.
Copy link
Member

@rhshadrach rhshadrach Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reST, there should be two backticks, ``Series`` (markdown would only have one). But this phrasing is used when the docstring is shared between Series and DataFrame implementations. Since this is just for Series, what do you think about something like:

This parameter is only for compatibility with DataFrame and is unused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to write the same for numeric_only?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think numeric_only is slightly different - if you specify numeric_only=True and the Series is non-numeric, then it will raise. This could be useful to identify errors - similar to adding an assert.

@marenwestermann
Copy link
Contributor Author

I made the requested changes. I will work on the CI failures tomorrow.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach
Copy link
Member

Thanks @marenwestermann - great work!

@phofl
Copy link
Member

phofl commented Feb 5, 2023

Thx @marenwestermann and thx @rhshadrach for reviewing

@marenwestermann
Copy link
Contributor Author

Thank you and thanks for reviewing!

@marenwestermann marenwestermann deleted the groupby-skew branch February 6, 2023 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants