-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: update the pandas.DataFrame.apply docstring #20202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update the pandas.DataFrame.apply docstring #20202
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Added some first comments
pandas/core/frame.py
Outdated
@@ -4818,66 +4818,82 @@ def aggregate(self, func, axis=0, *args, **kwargs): | |||
|
|||
def apply(self, func, axis=0, broadcast=None, raw=False, reduce=None, | |||
result_type=None, args=(), **kwds): | |||
"""Applies function along an axis of the DataFrame. | |||
""" | |||
Apply a function along an axis of the `Series`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the dataframe docs, so Series -> DataFrame
(also it is not needed to put single backticks around Series and DataFrame, same for the rest of the docstring)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add PR review suggestions from @jorisvandenbossche into commit c593a70
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and commit 4d3e8cf (i.e. the same for DataFrame)
pandas/core/frame.py
Outdated
of the index or the number of columns (based on the `axis` | ||
parameter) | ||
* `True` : results will be broadcast to the original shape | ||
of the frame, the original index and columns will be retained. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this keyword is deprecated, I am not sure it is that important to add such a detailed description (I would rather expand the explanation of the new keyword result_type
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche OK. But <<results will be broadcast to the original shape of the frame, the original index and columns will be retained.>> is already copied from the explanation of the new keyword result_type
. So nothing to add in the paragraph of this new parameter.
Shall I revert to the previous wording for broadcast
parameter, i.e.:
'''
broadcast : boolean, optional
For aggregation functions, return object of same size with values
propagated
'''
(...which is rather fuzzy),
or not ?
pandas/core/frame.py
Outdated
passed function will receive ndarray objects instead. If you are | ||
just applying a NumPy reduction function this will achieve much | ||
better performance | ||
* `False` : passes each row or column into a `Series` to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"passes each row or column into a Series
" -> "passes each row or column as a Series
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit 8caf5f6
pandas/core/frame.py
Outdated
Final return type depends on the return type of the applied function, | ||
or on the `result_type` argument. | ||
Objects passed to the function are `Series` objects having as index | ||
either the DataFrame's index (`axis=0`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use double backticks around axis=0 (as ``axis=0``
) (because it is a small code snippet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit c1214b4
pandas/core/frame.py
Outdated
array/series | ||
Additional keyword arguments will be passed as keywords to the function | ||
array/series. | ||
kwds : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kwds -> **kwds
(and the colon is not needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit 2432ff5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite nice, added some comments.
pandas/core/frame.py
Outdated
@@ -4818,66 +4818,82 @@ def aggregate(self, func, axis=0, *args, **kwargs): | |||
|
|||
def apply(self, func, axis=0, broadcast=None, raw=False, reduce=None, | |||
result_type=None, args=(), **kwds): | |||
"""Applies function along an axis of the DataFrame. | |||
""" | |||
Apply a function along an axis of the `Series`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change DataFrame
by Series
? The file is frame.py
I think it's apply for DataFrame
, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit c593a70
pandas/core/frame.py
Outdated
Objects passed to the function are `Series` objects having as index | ||
either the DataFrame's index (`axis=0`) | ||
or the DataFrame's columns (`axis=1`). | ||
If `result_type` is None, the final return type is the return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None
in backticks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add PR review suggestions from @datapythonista into commit da64c86
Maybe result_type is None
is better there, because it is a small code snippet.
pandas/core/frame.py
Outdated
Axis along which the function is applied: | ||
|
||
* 0 or 'index': apply function to each column. | ||
* 1 or 'columns': apply function to each row. | ||
broadcast : boolean, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool
instead of boolean
(also in the next 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit dbd90c1
pandas/core/frame.py
Outdated
are expanded to columns. | ||
|
||
.. versionadded:: 0.23.0 | ||
.. versionadded:: 0.23.0. | ||
|
||
args : tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
args
is always a tuple, so we remove it. The validation script will probably complain, but the convention is to use *args
and **kwargs
(with the stars)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we need to keep it as args
and the 'tuple' type description is correct. This is a special case because it actually is a keyword with the name args
, not *args
in the signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I keep
args : tuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, my fault, sorry.
pandas/core/frame.py
Outdated
Additional keyword arguments will be passed as keywords to the function | ||
array/series. | ||
kwds : | ||
Additional keyword arguments to pass as keywords to the function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A personal opinion, but for me it'd be a bit more explicit to say "to pass as keyword arguments to func
". I think saying "the function" should be clear enough for most people, but specifying which functions avoids any ambiguity. Also por args
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit 6cbadd8
pandas/core/frame.py
Outdated
@@ -4818,66 +4818,82 @@ def aggregate(self, func, axis=0, *args, **kwargs): | |||
|
|||
def apply(self, func, axis=0, broadcast=None, raw=False, reduce=None, | |||
result_type=None, args=(), **kwds): | |||
"""Applies function along an axis of the DataFrame. | |||
""" | |||
Apply a function along an axis of the `Series`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`Series` -> DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ai, we were all doing the review at the same time :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Sorry guys :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger I took into account your suggestion in the commit c593a70
pandas/core/frame.py
Outdated
either the DataFrame's index (axis=0) or the columns (axis=1). | ||
Final return type depends on the return type of the applied function, | ||
or on the `result_type` argument. | ||
Objects passed to the function are `Series` objects having as index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"having as" -> "whose index is either"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add PR review suggestions from @TomAugspurger into commit eaf1441
pandas/core/frame.py
Outdated
Final return type depends on the return type of the applied function, | ||
or on the `result_type` argument. | ||
Objects passed to the function are `Series` objects having as index | ||
either the DataFrame's index (`axis=0`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks around axis=0
and axis=1
since they're code snippets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit c1214b4
pandas/core/frame.py
Outdated
passed function will receive ndarray objects instead. If you are | ||
just applying a NumPy reduction function this will achieve much | ||
better performance | ||
* `False` : passes each row or column into a `Series` to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "passes each row or column as a Series to the function"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit 8caf5f6
pandas/core/frame.py
Outdated
function. | ||
* `True` : the passed function will receive ndarray objects | ||
instead. | ||
If you are just applying a NumPy reduction function this will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies to raw=Ture
so keep it under the item formed by that *
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit 3794d19
pandas/core/frame.py
Outdated
DataFrame will always be returned. | ||
|
||
.. deprecated:: 0.23.0 | ||
Try to apply reduction procedures. If the `DataFrame` is empty, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need backticsk around Series / DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit c593a70
pandas/core/frame.py
Outdated
|
||
.. deprecated:: 0.23.0 | ||
Try to apply reduction procedures. If the `DataFrame` is empty, | ||
:meth:`apply` will use reduce to determine whether the result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this resolves to pandas.apply
, which doesn't exist. SEe if you can rewreite to avoid the reference to apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After running
python make.py html --single pandas.DataFrame.apply
the :meth:apply
link resolves to pandas.DataFrame.apply
@TomAugspurger, if it is not the case for you, I shall revert to a wording more like: apply()
or DataFrame.apply()
... Let me know.
pandas/core/frame.py
Outdated
.. deprecated:: 0.23.0 | ||
Try to apply reduction procedures. If the `DataFrame` is empty, | ||
:meth:`apply` will use reduce to determine whether the result | ||
should be a `Series` or a `DataFrame`. If reduce is None (the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backticsk around parameter names (reduce
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit da64c86
Maybe reduce is None
is better there, because it is a small code snippet
pandas/core/frame.py
Outdated
array/series | ||
Additional keyword arguments will be passed as keywords to the function | ||
array/series. | ||
kwds : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just updated our docs on this, we're doing **kwds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit 2432ff5
pandas/core/frame.py
Outdated
:meth:`apply` will use `reduce` to determine whether the result | ||
should be a Series or a DataFrame. If ``reduce is None`` (the | ||
default), :meth:`apply`'s return value will be guessed by calling | ||
`func` on an empty Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche can you look at this formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdeboc Although :meth:`apply`
works correctly in sphinx rendering, I would not use the referencing syntax, as it is then just adding a reference to itself ?
I would just do `apply`
in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche I add this PR review suggestions into commit ee6919e
pandas/core/frame.py
Outdated
|
||
The default behaviour (None) depends on the return value of the | ||
The default behaviour (`None`) depends on the return value of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no parens here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/core/frame.py
Outdated
or the DataFrame's columns (``axis=1``). | ||
If ``result_type is None``, the final return type is the return | ||
type of the applied function. | ||
Otherwise, it depends on the `result_type` argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reflow this paragraph so each line is more or less <=79 chars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit 23be01b
pandas/core/frame.py
Outdated
either the DataFrame's index (``axis=0``) | ||
or the DataFrame's columns (``axis=1``). | ||
If ``result_type is None``, the final return type is the return | ||
type of the applied function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final return type is not exactly the same, but it is "inferred from the return type of the applied function"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR review suggestions into commit 51ad0cb
pandas/core/frame.py
Outdated
:meth:`apply` will use `reduce` to determine whether the result | ||
should be a Series or a DataFrame. If ``reduce is None`` (the | ||
default), :meth:`apply`'s return value will be guessed by calling | ||
`func` on an empty Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdeboc Although :meth:`apply`
works correctly in sphinx rendering, I would not use the referencing syntax, as it is then just adding a reference to itself ?
I would just do `apply`
in this case
Codecov Report
@@ Coverage Diff @@
## master #20202 +/- ##
=========================================
Coverage ? 91.7%
=========================================
Files ? 150
Lines ? 49167
Branches ? 0
=========================================
Hits ? 45089
Misses ? 4078
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, added some comments, mainly to parts of the original docstring that can be improved.
pandas/core/frame.py
Outdated
applied function: list-like results will be returned as a Series | ||
of those. However if the apply function returns a Series these | ||
are expanded to columns. | ||
|
||
.. versionadded:: 0.23.0 | ||
.. versionadded:: 0.23.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bug in the script, you don't need the period at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. see cf2f2d0
pandas/core/frame.py
Outdated
are expanded to columns. | ||
|
||
.. versionadded:: 0.23.0 | ||
.. versionadded:: 0.23.0. | ||
|
||
args : tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, my fault, sorry.
first column/row to decide whether it can take a fast or slow | ||
code path. This can lead to unexpected behavior if func has | ||
code path. This can lead to unexpected behavior if `func` has | ||
side-effects, as they will take effect twice for the first | ||
column/row. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you get rid of the "We use this DataFrame to illustrate" at the beginning of the examples? I don't think it adds any value.
Also, I'd personally prefer to create a smaller (e.g 3 rows, 2 cols) dataframe from a Python list, instead of all that numpy machinery. :) And if the example is with sqrt
, I'd use numbers like 9
so the user can quickly see what apply
does, which is the goal of this example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's simpler. See d52ac81
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datapythonista In commit 3160fb8, I suppress an unnecessary test, unintentionally introduced in previous commit (d52ac81)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor comments. Looking good for the rest!
pandas/core/frame.py
Outdated
`func` on an empty Series | ||
(note: while guessing, exceptions raised by `func` will be | ||
ignored). | ||
If ``reduce is True`` a Series will always be returned, and if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"reduce is True" -> "reduce=True" (same on line below and few lines above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche Changes in dfa0f55, but don't forget that this keyword is deprecated, as you mentioned Saturday. :)
pandas/core/frame.py
Outdated
instead. | ||
If you are just applying a NumPy reduction function this will | ||
achieve much better performance. | ||
reduce : bool or `None`, default `None` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no backticks are needed inside the type description part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes in 037c729
pandas/core/frame.py
Outdated
A B | ||
0 1 2 | ||
1 1 2 | ||
2 1 2 | ||
|
||
See also | ||
-------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the Returns and See Also sections to just after the Parameters section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
changes in d757960
@jorisvandenbossche maybe the teams working with functions: combine
, combine_first
, applymap
, join
and round
should do the same...
Hello @mdeboc! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on March 12, 2018 at 16:46 Hours UTC |
@mdeboc Thanks a lot! |
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.