Skip to content

DOC: update the aggregate docstring #20276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 13, 2018

Conversation

albertvillanova
Copy link
Contributor

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
#################### Docstring (pandas.DataFrame.aggregate) ####################
################################################################################

Aggregate using one or multiple operations along the specified axis.

.. versionadded:: 0.20.0

Parameters
----------
func : function, string, dictionary, or list of string/functions
    Function to use for aggregating the data. If a function, must either
    work when passed a DataFrame or when passed to DataFrame.apply. For
    a DataFrame, can pass a dict, if the keys are DataFrame column names.

    Accepted combinations are:

    - string function name.
    - function.
    - list of functions.
    - dict of column names -> functions (or list of functions).
axis : {0 or 'index', 1 or 'columns'}, default 0
    - 0 or 'index': apply function to each column.
    - 1 or 'columns': apply function to each row.
args
    Optional positional arguments to pass to the function.
kwargs
    Optional keyword arguments to pass to the function.

Returns
-------
aggregated : DataFrame

Notes
-----
`agg` is an alias for `aggregate`. Use the alias.

Notes
-----
The default behavior of aggregating over the axis 0 is different from
`numpy` functions `mean`/`median`/`prod`/`sum`/`std`/`var`, where the
default is to compute the aggregation of the flattened array (e.g.,
`numpy.mean(arr_2d)` as opposed to `numpy.mean(arr_2d, axis=0)`).

`agg` is an alias for `aggregate`. Use the alias.

Examples
--------

>>> df = df = pd.DataFrame([[1,2,3],
...                         [4,5,6],
...                         [7,8,9],
...                         [np.nan, np.nan, np.nan]],
...                        columns=['A', 'B', 'C'])

Aggregate these functions across all columns

>>> df.aggregate(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column

>>> df.aggregate({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
max   NaN  8.0
min   1.0  2.0
sum  12.0  NaN

See also
--------
pandas.DataFrame.apply : Perform any type of operations.
pandas.DataFrame.transform : Perform transformation type operations.
pandas.DataFrame.groupby.aggregate : Perform aggregation type operations
    over groups.
pandas.DataFrame.resample.aggregate : Perform aggregation type operations
    over resampled bins.
pandas.DataFrame.rolling.aggregate : Perform aggregation type operations
    over rolling window.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameter "axis" description should start with capital letter
		Parameter "args" has no type
		Parameter "kwargs" has no type

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

  • args and kwargs have no type
  • axis : {0 or 'index', 1 or 'columns'}, default 0

This PR has been made in collaboration with @rochamatcomp

Question to reviewers: we have added to the docstring the parameter axis because this is appropriate for DataFrame. However this dosctring is generic (shared with Series, groupby, window and resample) and the parameter axis is wrong for them (for Series the parameter has only one possible value). What should be the right way to fix this?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding axis, I would add a variable in the shared doc %(axis)s.

In core/frame you can sub axis : {0 or 'index', 1 or 'columns'}, default 0 and an explanation

In core/series you can sub axis: {0 or 'index} and note that it's for compatibility.

@@ -2353,6 +2353,13 @@ def _gotitem(self, key, ndim, subset=None):
return self

_agg_doc = dedent("""
Notes
-----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this section. Move the axis discussion to a substitution in parameters.

A B C
sum -0.182253 -0.614014 -2.909534
min -1.916563 -1.460076 -1.568297
>>> df.aggregate(['sum', 'min'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we recommend using .agg

Notes
-----
The default behavior of aggregating over the axis 0 is different from
`numpy` functions `mean`/`median`/`prod`/`sum`/`std`/`var`, where the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does sphinx complain about having a character immediately after the backticks? May need spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added commas instead.

The default behavior of aggregating over the axis 0 is different from
`numpy` functions `mean`/`median`/`prod`/`sum`/`std`/`var`, where the
default is to compute the aggregation of the flattened array (e.g.,
`numpy.mean(arr_2d)` as opposed to `numpy.mean(arr_2d, axis=0)`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double-ticks around the code snippets.

pandas.DataFrame.resample.aggregate
pandas.DataFrame.rolling.aggregate

pandas.DataFrame.apply : Perform any type of operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the pandas. from these, save a bit of space.


pandas.DataFrame.apply : Perform any type of operations.
pandas.DataFrame.transform : Perform transformation type operations.
pandas.DataFrame.groupby.aggregate : Perform aggregation type operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be pandas.core.groupby.GroupBy.transform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, best to just remove the tranform / aggregate bits and just link to

pandas.core.groupby.GroupBy
pandas.core.resample.Resampler
pandas.core.window.Rolling
pandas.core.window.Expanding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger Also pandas.core.window.EWM?

@@ -3929,36 +3929,38 @@ def pipe(self, func, *args, **kwargs):
return com._pipe(self, func, *args, **kwargs)

_shared_docs['aggregate'] = ("""
Aggregate using callable, string, dict, or list of string/callables
Aggregate using one or multiple operations along the specified axis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple -> more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

along -> over I think.

Does along mean apply to each element in an axis? And over means collapse an axis?

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Mar 11, 2018
default is to compute the aggregation of the flattened array (e.g.,
`numpy.mean(arr_2d)` as opposed to `numpy.mean(arr_2d, axis=0)`).

`agg` is an alias for `aggregate`. Use the alias.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is relevant, we always specify an axis, so by-definition you always have column-by-column or row-by-row aggregations, axis=None is simply not possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe adding some expl would help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better now?

>>> df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
... index=pd.date_range('1/1/2000', periods=10))
>>> df.iloc[3:7] = np.nan
>>> df = df = pd.DataFrame([[1,2,3],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra df =

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have this @albertvillanova

@albertvillanova
Copy link
Contributor Author

I have read that there has been a change to the convention during the sprint, and args and kwargs should be in the same line and a common description in the next line. Should we implement this?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we'll document them separately.

- list of functions.
- dict of column names -> functions (or list of functions).
%(axis)s
args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we'll document them separately.

This should be *args.

- dict of column names -> functions (or list of functions).
%(axis)s
args
Optional positional arguments to pass to the function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strike optional. The agg func could have additional required parameters.

%(axis)s
args
Optional positional arguments to pass to the function.
kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be **kwargs.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 11, 2018 via email

@TomAugspurger
Copy link
Contributor

Looks like there's a merge conflict @albertvillanova. Can you update? LMK if you need help with the git stuff.

@codecov
Copy link

codecov bot commented Mar 13, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@51765d0). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20276   +/-   ##
=========================================
  Coverage          ?    91.7%           
=========================================
  Files             ?      150           
  Lines             ?    49165           
  Branches          ?        0           
=========================================
  Hits              ?    45087           
  Misses            ?     4078           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.09% <ø> (?)
#single 41.86% <ø> (?)
Impacted Files Coverage Δ
pandas/core/window.py 96.31% <ø> (ø)
pandas/core/groupby.py 92.14% <ø> (ø)
pandas/core/resample.py 96.43% <ø> (ø)
pandas/core/frame.py 97.18% <ø> (ø)
pandas/core/series.py 93.84% <ø> (ø)
pandas/core/generic.py 95.85% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51765d0...e1a0c27. Read the comment docs.

PEP8.

Added axis='columns' example.
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example with axis='columns'.

@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 13, 2018
@TomAugspurger TomAugspurger merged commit 45f9e57 into pandas-dev:master Mar 13, 2018
@albertvillanova
Copy link
Contributor Author

Please, note that this PR has been made in collaboration with @rochamatcomp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants