-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: update the aggregate docstring #20276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update the aggregate docstring #20276
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding axis
, I would add a variable in the shared doc %(axis)s
.
In core/frame you can sub axis : {0 or 'index', 1 or 'columns'}, default 0
and an explanation
In core/series you can sub axis: {0 or 'index}
and note that it's for compatibility.
pandas/core/series.py
Outdated
@@ -2353,6 +2353,13 @@ def _gotitem(self, key, ndim, subset=None): | |||
return self | |||
|
|||
_agg_doc = dedent(""" | |||
Notes | |||
----- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove this section. Move the axis discussion to a substitution in parameters.
pandas/core/frame.py
Outdated
A B C | ||
sum -0.182253 -0.614014 -2.909534 | ||
min -1.916563 -1.460076 -1.568297 | ||
>>> df.aggregate(['sum', 'min']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we recommend using .agg
pandas/core/frame.py
Outdated
Notes | ||
----- | ||
The default behavior of aggregating over the axis 0 is different from | ||
`numpy` functions `mean`/`median`/`prod`/`sum`/`std`/`var`, where the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does sphinx complain about having a character immediately after the backticks? May need spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added commas instead.
pandas/core/frame.py
Outdated
The default behavior of aggregating over the axis 0 is different from | ||
`numpy` functions `mean`/`median`/`prod`/`sum`/`std`/`var`, where the | ||
default is to compute the aggregation of the flattened array (e.g., | ||
`numpy.mean(arr_2d)` as opposed to `numpy.mean(arr_2d, axis=0)`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double-ticks around the code snippets.
pandas/core/frame.py
Outdated
pandas.DataFrame.resample.aggregate | ||
pandas.DataFrame.rolling.aggregate | ||
|
||
pandas.DataFrame.apply : Perform any type of operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the pandas.
from these, save a bit of space.
pandas/core/frame.py
Outdated
|
||
pandas.DataFrame.apply : Perform any type of operations. | ||
pandas.DataFrame.transform : Perform transformation type operations. | ||
pandas.DataFrame.groupby.aggregate : Perform aggregation type operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be pandas.core.groupby.GroupBy.transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, best to just remove the tranform / aggregate bits and just link to
pandas.core.groupby.GroupBy
pandas.core.resample.Resampler
pandas.core.window.Rolling
pandas.core.window.Expanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger Also pandas.core.window.EWM?
pandas/core/generic.py
Outdated
@@ -3929,36 +3929,38 @@ def pipe(self, func, *args, **kwargs): | |||
return com._pipe(self, func, *args, **kwargs) | |||
|
|||
_shared_docs['aggregate'] = (""" | |||
Aggregate using callable, string, dict, or list of string/callables | |||
Aggregate using one or multiple operations along the specified axis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple -> more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
along -> over I think.
Does along mean apply to each element in an axis? And over means collapse an axis?
pandas/core/frame.py
Outdated
default is to compute the aggregation of the flattened array (e.g., | ||
`numpy.mean(arr_2d)` as opposed to `numpy.mean(arr_2d, axis=0)`). | ||
|
||
`agg` is an alias for `aggregate`. Use the alias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is relevant, we always specify an axis, so by-definition you always have column-by-column or row-by-row aggregations, axis=None
is simply not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe adding some expl would help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better now?
pandas/core/frame.py
Outdated
>>> df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'], | ||
... index=pd.date_range('1/1/2000', periods=10)) | ||
>>> df.iloc[3:7] = np.nan | ||
>>> df = df = pd.DataFrame([[1,2,3], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra df =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still have this @albertvillanova
I have read that there has been a change to the convention during the sprint, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we'll document them separately.
pandas/core/generic.py
Outdated
- list of functions. | ||
- dict of column names -> functions (or list of functions). | ||
%(axis)s | ||
args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we'll document them separately.
This should be *args
.
pandas/core/generic.py
Outdated
- dict of column names -> functions (or list of functions). | ||
%(axis)s | ||
args | ||
Optional positional arguments to pass to the function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strike optional. The agg func could have additional required parameters.
pandas/core/generic.py
Outdated
%(axis)s | ||
args | ||
Optional positional arguments to pass to the function. | ||
kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be **kwargs
.
Sure
…On Sun, Mar 11, 2018 at 11:12 AM, Albert Villanova del Moral < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/core/frame.py
<#20276 (comment)>:
>
See also
--------
- pandas.DataFrame.apply
- pandas.DataFrame.transform
- pandas.DataFrame.groupby.aggregate
- pandas.DataFrame.resample.aggregate
- pandas.DataFrame.rolling.aggregate
-
+ pandas.DataFrame.apply : Perform any type of operations.
+ pandas.DataFrame.transform : Perform transformation type operations.
+ pandas.DataFrame.groupby.aggregate : Perform aggregation type operations
@TomAugspurger <https://github.com/tomaugspurger> Also
pandas.core.window.EWM?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20276 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIiBdNMGBt9kq94jiJQAED8qPomMrks5tdUzbgaJpZM4Slo-D>
.
|
Looks like there's a merge conflict @albertvillanova. Can you update? LMK if you need help with the git stuff. |
Codecov Report
@@ Coverage Diff @@
## master #20276 +/- ##
=========================================
Coverage ? 91.7%
=========================================
Files ? 150
Lines ? 49165
Branches ? 0
=========================================
Hits ? 45087
Misses ? 4078
Partials ? 0
Continue to review full report at Codecov.
|
…ocstring_aggregate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example with axis='columns'
.
Please, note that this PR has been made in collaboration with @rochamatcomp |
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.
This PR has been made in collaboration with @rochamatcomp
Question to reviewers: we have added to the docstring the parameter
axis
because this is appropriate for DataFrame. However this dosctring is generic (shared with Series, groupby, window and resample) and the parameteraxis
is wrong for them (for Series the parameter has only one possible value). What should be the right way to fix this?