Skip to content

DOC: update the aggregate docstring #20276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 13, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 31 additions & 18 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4764,36 +4764,49 @@ def _gotitem(self, key, ndim, subset=None):
return self[key]

_agg_doc = dedent("""
Notes
-----
The default behavior of aggregating over the axis 0 is different from
`numpy` functions `mean`/`median`/`prod`/`sum`/`std`/`var`, where the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does sphinx complain about having a character immediately after the backticks? May need spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added commas instead.

default is to compute the aggregation of the flattened array (e.g.,
`numpy.mean(arr_2d)` as opposed to `numpy.mean(arr_2d, axis=0)`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double-ticks around the code snippets.


`agg` is an alias for `aggregate`. Use the alias.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is relevant, we always specify an axis, so by-definition you always have column-by-column or row-by-row aggregations, axis=None is simply not possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe adding some expl would help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better now?


Examples
--------

>>> df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
... index=pd.date_range('1/1/2000', periods=10))
>>> df.iloc[3:7] = np.nan
>>> df = df = pd.DataFrame([[1,2,3],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra df =

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have this @albertvillanova

... [4,5,6],
... [7,8,9],
... [np.nan, np.nan, np.nan]],
... columns=['A', 'B', 'C'])

Aggregate these functions across all columns

>>> df.agg(['sum', 'min'])
A B C
sum -0.182253 -0.614014 -2.909534
min -1.916563 -1.460076 -1.568297
>>> df.aggregate(['sum', 'min'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we recommend using .agg

A B C
sum 12.0 15.0 18.0
min 1.0 2.0 3.0

Different aggregations per column

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
A B
max NaN 1.514318
min -1.916563 -1.460076
sum -0.182253 NaN
>>> df.aggregate({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
A B
max NaN 8.0
min 1.0 2.0
sum 12.0 NaN

See also
--------
pandas.DataFrame.apply
pandas.DataFrame.transform
pandas.DataFrame.groupby.aggregate
pandas.DataFrame.resample.aggregate
pandas.DataFrame.rolling.aggregate

pandas.DataFrame.apply : Perform any type of operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the pandas. from these, save a bit of space.

pandas.DataFrame.transform : Perform transformation type operations.
pandas.DataFrame.groupby.aggregate : Perform aggregation type operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be pandas.core.groupby.GroupBy.transform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, best to just remove the tranform / aggregate bits and just link to

pandas.core.groupby.GroupBy
pandas.core.resample.Resampler
pandas.core.window.Rolling
pandas.core.window.Expanding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger Also pandas.core.window.EWM?

over groups.
pandas.DataFrame.resample.aggregate : Perform aggregation type operations
over resampled bins.
pandas.DataFrame.rolling.aggregate : Perform aggregation type operations
over rolling window.
""")

@Appender(_agg_doc)
Expand Down
35 changes: 18 additions & 17 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3929,36 +3929,38 @@ def pipe(self, func, *args, **kwargs):
return com._pipe(self, func, *args, **kwargs)

_shared_docs['aggregate'] = ("""
Aggregate using callable, string, dict, or list of string/callables
Aggregate using one or multiple operations along the specified axis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple -> more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

along -> over I think.

Does along mean apply to each element in an axis? And over means collapse an axis?


%(versionadded)s

Parameters
----------
func : callable, string, dictionary, or list of string/callables
func : function, string, dictionary, or list of string/functions
Function to use for aggregating the data. If a function, must either
work when passed a %(klass)s or when passed to %(klass)s.apply. For
a DataFrame, can pass a dict, if the keys are DataFrame column names.

Accepted Combinations are:

- string function name
- function
- list of functions
- dict of column names -> functions (or list of functions)

Notes
-----
Numpy functions mean/median/prod/sum/std/var are special cased so the
default behavior is applying the function along axis=0
(e.g., np.mean(arr_2d, axis=0)) as opposed to
mimicking the default Numpy behavior (e.g., np.mean(arr_2d)).
Accepted combinations are:

`agg` is an alias for `aggregate`. Use the alias.
- string function name.
- function.
- list of functions.
- dict of column names -> functions (or list of functions).
axis : {0 or 'index', 1 or 'columns'}, default 0
- 0 or 'index': apply function to each column.
- 1 or 'columns': apply function to each row.
args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we'll document them separately.

This should be *args.

Optional positional arguments to pass to the function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strike optional. The agg func could have additional required parameters.

kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be **kwargs.

Optional keyword arguments to pass to the function.

Returns
-------
aggregated : %(klass)s

Notes
-----
`agg` is an alias for `aggregate`. Use the alias.
""")

_shared_docs['transform'] = ("""
Expand Down Expand Up @@ -4006,7 +4008,6 @@ def pipe(self, func, *args, **kwargs):
--------
pandas.%(klass)s.aggregate
pandas.%(klass)s.apply

""")

# ----------------------------------------------------------------------
Expand Down
7 changes: 7 additions & 0 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2353,6 +2353,13 @@ def _gotitem(self, key, ndim, subset=None):
return self

_agg_doc = dedent("""
Notes
-----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this section. Move the axis discussion to a substitution in parameters.

The only possible value for axis is 0 or 'index' because
:class:`~pandas.Series` has only one axis.

`agg` is an alias for `aggregate`. Use the alias.

Examples
--------

Expand Down