-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
DOC: update the aggregate docstring #20276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
0cb2603
9379817
56cce01
50aa393
a8c992a
2bb57d5
e1a0c27
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4764,36 +4764,49 @@ def _gotitem(self, key, ndim, subset=None): | |
return self[key] | ||
|
||
_agg_doc = dedent(""" | ||
Notes | ||
----- | ||
The default behavior of aggregating over the axis 0 is different from | ||
`numpy` functions `mean`/`median`/`prod`/`sum`/`std`/`var`, where the | ||
default is to compute the aggregation of the flattened array (e.g., | ||
`numpy.mean(arr_2d)` as opposed to `numpy.mean(arr_2d, axis=0)`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. double-ticks around the code snippets. |
||
|
||
`agg` is an alias for `aggregate`. Use the alias. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure this is relevant, we always specify an axis, so by-definition you always have column-by-column or row-by-row aggregations, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe adding some expl would help here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it better now? |
||
|
||
Examples | ||
-------- | ||
|
||
>>> df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'], | ||
... index=pd.date_range('1/1/2000', periods=10)) | ||
>>> df.iloc[3:7] = np.nan | ||
>>> df = df = pd.DataFrame([[1,2,3], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. extra df = There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Still have this @albertvillanova |
||
... [4,5,6], | ||
... [7,8,9], | ||
... [np.nan, np.nan, np.nan]], | ||
... columns=['A', 'B', 'C']) | ||
|
||
Aggregate these functions across all columns | ||
|
||
>>> df.agg(['sum', 'min']) | ||
A B C | ||
sum -0.182253 -0.614014 -2.909534 | ||
min -1.916563 -1.460076 -1.568297 | ||
>>> df.aggregate(['sum', 'min']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we recommend using |
||
A B C | ||
sum 12.0 15.0 18.0 | ||
min 1.0 2.0 3.0 | ||
|
||
Different aggregations per column | ||
|
||
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']}) | ||
A B | ||
max NaN 1.514318 | ||
min -1.916563 -1.460076 | ||
sum -0.182253 NaN | ||
>>> df.aggregate({'A' : ['sum', 'min'], 'B' : ['min', 'max']}) | ||
A B | ||
max NaN 8.0 | ||
min 1.0 2.0 | ||
sum 12.0 NaN | ||
|
||
See also | ||
-------- | ||
pandas.DataFrame.apply | ||
pandas.DataFrame.transform | ||
pandas.DataFrame.groupby.aggregate | ||
pandas.DataFrame.resample.aggregate | ||
pandas.DataFrame.rolling.aggregate | ||
|
||
pandas.DataFrame.apply : Perform any type of operations. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can remove the |
||
pandas.DataFrame.transform : Perform transformation type operations. | ||
pandas.DataFrame.groupby.aggregate : Perform aggregation type operations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be pandas.core.groupby.GroupBy.transform. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, best to just remove the tranform / aggregate bits and just link to pandas.core.groupby.GroupBy There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger Also pandas.core.window.EWM? |
||
over groups. | ||
pandas.DataFrame.resample.aggregate : Perform aggregation type operations | ||
over resampled bins. | ||
pandas.DataFrame.rolling.aggregate : Perform aggregation type operations | ||
over rolling window. | ||
""") | ||
|
||
@Appender(_agg_doc) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3929,36 +3929,38 @@ def pipe(self, func, *args, **kwargs): | |
return com._pipe(self, func, *args, **kwargs) | ||
|
||
_shared_docs['aggregate'] = (""" | ||
Aggregate using callable, string, dict, or list of string/callables | ||
Aggregate using one or multiple operations along the specified axis. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. multiple -> more There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. along -> over I think. Does along mean apply to each element in an axis? And over means collapse an axis? |
||
|
||
%(versionadded)s | ||
|
||
Parameters | ||
---------- | ||
func : callable, string, dictionary, or list of string/callables | ||
func : function, string, dictionary, or list of string/functions | ||
Function to use for aggregating the data. If a function, must either | ||
work when passed a %(klass)s or when passed to %(klass)s.apply. For | ||
a DataFrame, can pass a dict, if the keys are DataFrame column names. | ||
|
||
Accepted Combinations are: | ||
|
||
- string function name | ||
- function | ||
- list of functions | ||
- dict of column names -> functions (or list of functions) | ||
|
||
Notes | ||
----- | ||
Numpy functions mean/median/prod/sum/std/var are special cased so the | ||
default behavior is applying the function along axis=0 | ||
(e.g., np.mean(arr_2d, axis=0)) as opposed to | ||
mimicking the default Numpy behavior (e.g., np.mean(arr_2d)). | ||
Accepted combinations are: | ||
|
||
`agg` is an alias for `aggregate`. Use the alias. | ||
- string function name. | ||
- function. | ||
- list of functions. | ||
- dict of column names -> functions (or list of functions). | ||
axis : {0 or 'index', 1 or 'columns'}, default 0 | ||
- 0 or 'index': apply function to each column. | ||
- 1 or 'columns': apply function to each row. | ||
args | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case, we'll document them separately. This should be |
||
Optional positional arguments to pass to the function. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Strike optional. The agg func could have additional required parameters. |
||
kwargs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be |
||
Optional keyword arguments to pass to the function. | ||
|
||
Returns | ||
------- | ||
aggregated : %(klass)s | ||
|
||
Notes | ||
----- | ||
`agg` is an alias for `aggregate`. Use the alias. | ||
""") | ||
|
||
_shared_docs['transform'] = (""" | ||
|
@@ -4006,7 +4008,6 @@ def pipe(self, func, *args, **kwargs): | |
-------- | ||
pandas.%(klass)s.aggregate | ||
pandas.%(klass)s.apply | ||
|
||
""") | ||
|
||
# ---------------------------------------------------------------------- | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2353,6 +2353,13 @@ def _gotitem(self, key, ndim, subset=None): | |
return self | ||
|
||
_agg_doc = dedent(""" | ||
Notes | ||
----- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can remove this section. Move the axis discussion to a substitution in parameters. |
||
The only possible value for axis is 0 or 'index' because | ||
:class:`~pandas.Series` has only one axis. | ||
|
||
`agg` is an alias for `aggregate`. Use the alias. | ||
|
||
Examples | ||
-------- | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does sphinx complain about having a character immediately after the backticks? May need spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added commas instead.