DOC: update the aggregate docstring #20276

albertvillanova · 2018-03-11T10:40:36Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
#################### Docstring (pandas.DataFrame.aggregate) ####################
################################################################################

Aggregate using one or multiple operations along the specified axis.

.. versionadded:: 0.20.0

Parameters
----------
func : function, string, dictionary, or list of string/functions
    Function to use for aggregating the data. If a function, must either
    work when passed a DataFrame or when passed to DataFrame.apply. For
    a DataFrame, can pass a dict, if the keys are DataFrame column names.

    Accepted combinations are:

    - string function name.
    - function.
    - list of functions.
    - dict of column names -> functions (or list of functions).
axis : {0 or 'index', 1 or 'columns'}, default 0
    - 0 or 'index': apply function to each column.
    - 1 or 'columns': apply function to each row.
args
    Optional positional arguments to pass to the function.
kwargs
    Optional keyword arguments to pass to the function.

Returns
-------
aggregated : DataFrame

Notes
-----
`agg` is an alias for `aggregate`. Use the alias.

Notes
-----
The default behavior of aggregating over the axis 0 is different from
`numpy` functions `mean`/`median`/`prod`/`sum`/`std`/`var`, where the
default is to compute the aggregation of the flattened array (e.g.,
`numpy.mean(arr_2d)` as opposed to `numpy.mean(arr_2d, axis=0)`).

`agg` is an alias for `aggregate`. Use the alias.

Examples
--------

>>> df = df = pd.DataFrame([[1,2,3],
...                         [4,5,6],
...                         [7,8,9],
...                         [np.nan, np.nan, np.nan]],
...                        columns=['A', 'B', 'C'])

Aggregate these functions across all columns

>>> df.aggregate(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column

>>> df.aggregate({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
max   NaN  8.0
min   1.0  2.0
sum  12.0  NaN

See also
--------
pandas.DataFrame.apply : Perform any type of operations.
pandas.DataFrame.transform : Perform transformation type operations.
pandas.DataFrame.groupby.aggregate : Perform aggregation type operations
    over groups.
pandas.DataFrame.resample.aggregate : Perform aggregation type operations
    over resampled bins.
pandas.DataFrame.rolling.aggregate : Perform aggregation type operations
    over rolling window.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameter "axis" description should start with capital letter
		Parameter "args" has no type
		Parameter "kwargs" has no type

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

args and kwargs have no type
axis : {0 or 'index', 1 or 'columns'}, default 0

This PR has been made in collaboration with @rochamatcomp

Question to reviewers: we have added to the docstring the parameter axis because this is appropriate for DataFrame. However this dosctring is generic (shared with Series, groupby, window and resample) and the parameter axis is wrong for them (for Series the parameter has only one possible value). What should be the right way to fix this?

TomAugspurger

Regarding axis, I would add a variable in the shared doc %(axis)s.

In core/frame you can sub axis : {0 or 'index', 1 or 'columns'}, default 0 and an explanation

In core/series you can sub axis: {0 or 'index} and note that it's for compatibility.

TomAugspurger · 2018-03-11T11:13:58Z

pandas/core/series.py

@@ -2353,6 +2353,13 @@ def _gotitem(self, key, ndim, subset=None):
        return self

    _agg_doc = dedent("""
+    Notes
+    -----


Can remove this section. Move the axis discussion to a substitution in parameters.

TomAugspurger · 2018-03-11T11:14:19Z

pandas/core/frame.py

-                A         B         C
-    sum -0.182253 -0.614014 -2.909534
-    min -1.916563 -1.460076 -1.568297
+    >>> df.aggregate(['sum', 'min'])


I think we recommend using .agg

TomAugspurger · 2018-03-11T11:14:44Z

pandas/core/frame.py

+    Notes
+    -----
+    The default behavior of aggregating over the axis 0 is different from
+    `numpy` functions `mean`/`median`/`prod`/`sum`/`std`/`var`, where the


Does sphinx complain about having a character immediately after the backticks? May need spaces.

I have added commas instead.

TomAugspurger · 2018-03-11T11:14:57Z

pandas/core/frame.py

+    The default behavior of aggregating over the axis 0 is different from
+    `numpy` functions `mean`/`median`/`prod`/`sum`/`std`/`var`, where the
+    default is to compute the aggregation of the flattened array (e.g.,
+    `numpy.mean(arr_2d)` as opposed to `numpy.mean(arr_2d, axis=0)`).


double-ticks around the code snippets.

TomAugspurger · 2018-03-11T11:16:03Z

pandas/core/frame.py

-    pandas.DataFrame.resample.aggregate
-    pandas.DataFrame.rolling.aggregate
-
+    pandas.DataFrame.apply : Perform any type of operations.


You can remove the pandas. from these, save a bit of space.

TomAugspurger · 2018-03-11T11:17:25Z

pandas/core/frame.py

-
+    pandas.DataFrame.apply : Perform any type of operations.
+    pandas.DataFrame.transform : Perform transformation type operations.
+    pandas.DataFrame.groupby.aggregate : Perform aggregation type operations


This should be pandas.core.groupby.GroupBy.transform.

Actually, best to just remove the tranform / aggregate bits and just link to

pandas.core.groupby.GroupBy
pandas.core.resample.Resampler
pandas.core.window.Rolling
pandas.core.window.Expanding

@TomAugspurger Also pandas.core.window.EWM?

TomAugspurger · 2018-03-11T11:19:33Z

pandas/core/generic.py

@@ -3929,36 +3929,38 @@ def pipe(self, func, *args, **kwargs):
        return com._pipe(self, func, *args, **kwargs)

    _shared_docs['aggregate'] = ("""
-    Aggregate using callable, string, dict, or list of string/callables
+    Aggregate using one or multiple operations along the specified axis.


multiple -> more

along -> over I think.

Does along mean apply to each element in an axis? And over means collapse an axis?

jreback · 2018-03-11T13:01:16Z

pandas/core/frame.py

+    default is to compute the aggregation of the flattened array (e.g.,
+    `numpy.mean(arr_2d)` as opposed to `numpy.mean(arr_2d, axis=0)`).
+
+    `agg` is an alias for `aggregate`. Use the alias.


I am not sure this is relevant, we always specify an axis, so by-definition you always have column-by-column or row-by-row aggregations, axis=None is simply not possible.

maybe adding some expl would help here.

Is it better now?

jreback · 2018-03-11T13:01:37Z

pandas/core/frame.py

-    >>> df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
-    ...                   index=pd.date_range('1/1/2000', periods=10))
-    >>> df.iloc[3:7] = np.nan
+    >>> df = df = pd.DataFrame([[1,2,3],


Still have this @albertvillanova

albertvillanova · 2018-03-11T15:53:43Z

I have read that there has been a change to the convention during the sprint, and args and kwargs should be in the same line and a common description in the next line. Should we implement this?

TomAugspurger

In this case we'll document them separately.

TomAugspurger · 2018-03-11T16:00:22Z

pandas/core/generic.py

+        - list of functions.
+        - dict of column names -> functions (or list of functions).
+    %(axis)s
+    args


In this case, we'll document them separately.

This should be *args.

TomAugspurger · 2018-03-11T16:00:41Z

pandas/core/generic.py

+        - dict of column names -> functions (or list of functions).
+    %(axis)s
+    args
+        Optional positional arguments to pass to the function.


Strike optional. The agg func could have additional required parameters.

TomAugspurger · 2018-03-11T16:00:56Z

pandas/core/generic.py

+    %(axis)s
+    args
+        Optional positional arguments to pass to the function.
+    kwargs


This should be **kwargs.

TomAugspurger · 2018-03-11T16:13:57Z

Sure

…

On Sun, Mar 11, 2018 at 11:12 AM, Albert Villanova del Moral < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/core/frame.py <#20276 (comment)>: > See also -------- - pandas.DataFrame.apply - pandas.DataFrame.transform - pandas.DataFrame.groupby.aggregate - pandas.DataFrame.resample.aggregate - pandas.DataFrame.rolling.aggregate - + pandas.DataFrame.apply : Perform any type of operations. + pandas.DataFrame.transform : Perform transformation type operations. + pandas.DataFrame.groupby.aggregate : Perform aggregation type operations @TomAugspurger <https://github.com/tomaugspurger> Also pandas.core.window.EWM? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20276 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIiBdNMGBt9kq94jiJQAED8qPomMrks5tdUzbgaJpZM4Slo-D> .

TomAugspurger · 2018-03-12T15:31:36Z

Looks like there's a merge conflict @albertvillanova. Can you update? LMK if you need help with the git stuff.

codecov · 2018-03-13T05:49:44Z

Codecov Report

❗ No coverage uploaded for pull request base (master@51765d0). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #20276   +/-   ##
=========================================
  Coverage          ?    91.7%           
=========================================
  Files             ?      150           
  Lines             ?    49165           
  Branches          ?        0           
=========================================
  Hits              ?    45087           
  Misses            ?     4078           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.09% <ø> (?)`
#single	`41.86% <ø> (?)`

Impacted Files	Coverage Δ
pandas/core/window.py	`96.31% <ø> (ø)`
pandas/core/groupby.py	`92.14% <ø> (ø)`
pandas/core/resample.py	`96.43% <ø> (ø)`
pandas/core/frame.py	`97.18% <ø> (ø)`
pandas/core/series.py	`93.84% <ø> (ø)`
pandas/core/generic.py	`95.85% <ø> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51765d0...e1a0c27. Read the comment docs.

…ocstring_aggregate

PEP8. Added axis='columns' example.

TomAugspurger

Added an example with axis='columns'.

albertvillanova · 2018-03-13T14:56:41Z

Please, note that this PR has been made in collaboration with @rochamatcomp

DOC: update the aggregate docstring

0cb2603

TomAugspurger added the Docs label Mar 11, 2018

jorisvandenbossche added Docs and removed Docs labels Mar 11, 2018

TomAugspurger requested changes Mar 11, 2018

View reviewed changes

jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Mar 11, 2018

jreback requested changes Mar 11, 2018

View reviewed changes

Fix axis issue

9379817

TomAugspurger reviewed Mar 11, 2018

View reviewed changes

Address requested changes

56cce01

albertvillanova added 2 commits March 12, 2018 23:32

Resolve PR merge conflicts

50aa393

Fix line too long

a8c992a

TomAugspurger added 2 commits March 13, 2018 09:05

Merge remote-tracking branch 'upstream/master' into albertvillanova-d…

2bb57d5

…ocstring_aggregate

Updates

e1a0c27

PEP8. Added axis='columns' example.

TomAugspurger approved these changes Mar 13, 2018

View reviewed changes

TomAugspurger added this to the 0.23.0 milestone Mar 13, 2018

TomAugspurger merged commit 45f9e57 into pandas-dev:master Mar 13, 2018

Uh oh!

DOC: update the aggregate docstring #20276

DOC: update the aggregate docstring #20276

Uh oh!

Conversation

albertvillanova commented Mar 11, 2018

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova commented Mar 11, 2018

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Mar 11, 2018 via email

Uh oh!

TomAugspurger commented Mar 12, 2018

Uh oh!

codecov bot commented Mar 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

albertvillanova commented Mar 13, 2018

Uh oh!

Uh oh!

codecov bot commented Mar 13, 2018 •

edited

Loading