Skip to content

DOC: DataFrame.ewm clean-up #32212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 7, 2020
Merged

DOC: DataFrame.ewm clean-up #32212

merged 17 commits into from
Mar 7, 2020

Conversation

sursu
Copy link
Contributor

@sursu sursu commented Feb 23, 2020

Cosmetic change.

@sursu sursu changed the title Added missing spaces DOC: Added missing spaces Feb 23, 2020
@MarcoGorelli
Copy link
Member

Thanks @sursu !

I'd mentioned this and some other changes to this docstring in #31647, do you feel like taking any of the other points too? A few people commented "take" but nobody submitted anything :)

@sursu
Copy link
Contributor Author

sursu commented Feb 24, 2020

There's more there.
For instance, \text{ for } shouldn't be a LaTeX text, should be normal text, otherwise in a single line we have 3 types of symbols.

I am not familiar with :math:, and I don't know how to preview my changes.
Does it render the entire line or only what is take in ``?

I don't understand this invention. I don't see why using :math: was smarter than using LaTeX/MathJax syntax.

@WillAyd
Copy link
Member

WillAyd commented Feb 24, 2020

I am not familiar with :math:, and I don't know how to preview my changes.
Does it render the entire line or only what is take in ``?

I don't understand this invention. I don't see why using :math: was smarter than using LaTeX/MathJax syntax.

You will want to build the docs:

https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#how-to-build-the-pandas-documentation

If you can post screenshot afterwards would be helpful

We use sphinx to build restructured text. Math is a role provided therein:

https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#math

@WillAyd WillAyd added the Docs label Feb 24, 2020
@sursu sursu mentioned this pull request Feb 25, 2020
1 task
@WillAyd WillAyd closed this Feb 25, 2020
@WillAyd WillAyd reopened this Feb 25, 2020
@pep8speaks
Copy link

pep8speaks commented Feb 25, 2020

Hello @sursu! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-07 18:39:06 UTC

@sursu
Copy link
Contributor Author

sursu commented Feb 25, 2020

Screenshots:

image

image

The Notes have been expanded with a few details from the given link.

@sursu
Copy link
Contributor Author

sursu commented Feb 25, 2020

Ready for Review

@sursu sursu changed the title DOC: Added missing spaces DOC: DataFrame.ewm clean-up Feb 25, 2020
@WillAyd
Copy link
Member

WillAyd commented Feb 26, 2020

Looks like CI is red - can you take a look?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 26, 2020

Thanks @sursu , this is much nicer!

Though, I don't know what @MarcoGorelli meant in the 5'th issue.

Just to clarify - in the doc for the respective series method (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.ewm.html), it says

"axis{0 or ‘index’, 1 or ‘columns’}, default 0".

However, Series don't have columns :)

But that's a recurring issue in many parts of the documentation, so IMO it's OK to keep it separate from this

@sursu
Copy link
Contributor Author

sursu commented Feb 26, 2020

Oh, so pandas.Series.ewm.html is also generated from ewm.py?

Well if someone tries to perform an EW function column-wise on a series they would bump into:

ValueError: No axis named columns for object type <class 'pandas.core.series.Series'>

which is a clear message.

Also, it seems that you can also use axis=rows. Should I specify this?

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this @sursu, nice stuff. Added some comments, most of them are related to the original docstring, and not your changes, but if you don't mind at having a look at them that would be very useful.

Not a problem if you think this should better be addressed in a follow up, or by someone else. Thanks!

Available EW (exponentially weighted) methods: ``mean()``, ``var()``, ``std()``,
``corr()``, ``cov()``.

When ``adjust=True`` (default), the EW function is calculated using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have this in the adjust parameter description instead?

Copy link
Contributor Author

@sursu sursu Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 cases there which require further details. So I think that is better addressed in the Notes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but I don't understand the reasoning. What I have in mind is that if I'm interested in what the adjust parameter does, I'll go to its description to understand what it does. And the description there is somehow poor, while here he have all the relevant information.

So, this feels misplaced to me. Not sure how the fact that this happens twice in this page changes this reasoning. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, the parameter descriptions are useful when I need to check briefly what parameters may be relevant, what they do, what possible values can they take (via Shift+Tab).
The description needs to be concise, if you want more details about each case, scroll a little below and look at the Notes.

To be able to read the equations there you will have to open the browser. When someone will want to quickly preview the description with Shift+Tab, it will just be cluttered with sphinx signs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand how you use the docs, but in the link I gave you explains what's our convention regarding the sections, so I prefer to follow that, and have the parameters sections for the parameters, and the notes for the implementation notes.

y_t &= (1 - \alpha) y_{t-1} + \alpha x_t,
\end{split}

When ``ignore_na=False`` (default), weights are based on absolute positions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this in the ignore_na one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current parameter description of ignore_na is fine.


More details can be found at:
`Exponentially weighted windows
<https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#exponentially-weighted-windows>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx allow you to create links automatically with :ref:. We should use this, since we may be changing the path of the docs (may be using the domain pandas.io, or using pandas.pydata.org/docs/ instead of pandas-docs/stable...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried that, didn't succeed. If you can instruct me here, I'd be thankful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you grep for :ref: you can find examples, like:

:ref:`API Changes <whatsnew_0141.api>`

What's inside <> is a label somewhere in the code, defined as .. _whatsnew_0141.api:. If you don't have a label for the section you're linking here (it probably exists), you can just create it following the convention of the other labels in that page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not able to make :ref: work.
In the same file there is not label defined.
This is my first time working with Sphinx, that's why I'm asking for help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can provide more information that what I said above. With that, and checking other examples, should be quite straight forward to get this working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you tried, but this should be simply:

:ref:`Exponentially weighted windows <stats.moments.exponentially_weighted>`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:ref:`Exponentially weighted windows <stats.moments.exponentially_weighted>`
Does not produce a link. Does not give me an error either.
Could it be that I don't see it as a link because I build only the pandas.DataFrame.ewm page?


More details can be found at
https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#exponentially-weighted-windows
Exactly one paramter: ``com``, ``span``, ``halflife``, or ``alpha``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be part of the extended summary, not the notes. And I think the extended summary should provide information on what it does. It's not a blocker to get this merged, but there is a lot of content in this page, but reading it doesn't really explain what this functionality does IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean after "Provide exponential weighted functions."?

I agree that Exactly one parameter: ``com``, ``span``, ``halflife``, or ``alpha`` must be provided. could be added there. Should I do that and then delete the first paragraph from the notes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. You have information on the different sections, and what should be in each at: https://pandas.pydata.org/pandas-docs/stable/development/contributing_docstring.html

Available EW (exponentially weighted) methods: ``mean()``, ``var()``, ``std()``,
``corr()``, ``cov()``.

When ``adjust=True`` (default), the EW function is calculated using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand how you use the docs, but in the link I gave you explains what's our convention regarding the sections, so I prefer to follow that, and have the parameters sections for the parameters, and the notes for the implementation notes.


More details can be found at:
`Exponentially weighted windows
<https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#exponentially-weighted-windows>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can provide more information that what I said above. With that, and checking other examples, should be quite straight forward to get this working.

@sursu
Copy link
Contributor Author

sursu commented Feb 27, 2020

image

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sursu, looks great.

Couple of minor things that I think should make reading this easier.

@@ -52,9 +58,38 @@ class EWM(_Rolling):
Divide by decaying adjustment factor in beginning periods to account
for imbalance in relative weightings
(viewing EWMA as a moving average).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can merge this in the previous line.

@@ -52,9 +58,38 @@ class EWM(_Rolling):
Divide by decaying adjustment factor in beginning periods to account
for imbalance in relative weightings
(viewing EWMA as a moving average).
When ``adjust=True`` (default), the EW function is calculated using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave a blank line before, so it's more readable in the markdown version. And make the two When adjust=.. a bullet point, so it's easier to read in the html.

Copy link
Contributor Author

@sursu sursu Mar 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing bullet-points produces these warnings:
image

And the html file:

image

So, as I understand, if I want everything to look neat I should not break the line. However, if I do that there will be a conflict with the requirement that the line should be <= 88 chars.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the problem. I think should be as easy as using something like:

* When adjust=whatever
  continuation lines are indented
  if one is two long to be more than 88 just add
  a line break before it is.

And leave a blank line after the bullet point.

Is there any problem with this?

Copy link
Contributor Author

@sursu sursu Mar 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this works, sorry. I didn't indent properly the lines.

image

Ignore missing values when calculating weights; specify ``True`` to reproduce
pre-0.15.0 behavior.

When ``ignore_na=False`` (default), weights are based on absolute positions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same about making them a bullet points.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the link is pending, but looks good.


More details can be found at:
`Exponentially weighted windows
<https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#exponentially-weighted-windows>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you tried, but this should be simply:

:ref:`Exponentially weighted windows <stats.moments.exponentially_weighted>`

@datapythonista
Copy link
Member

You'll have to build all the docs, not that single page to see the link.

Also, if you can merge master please, so we make the CI green and we can merge...

@datapythonista datapythonista merged commit 20474f5 into pandas-dev:master Mar 7, 2020
@datapythonista
Copy link
Member

Thanks for the good docstring @sursu

@simonjayhawkins simonjayhawkins added this to the 1.1 milestone Mar 9, 2020
SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants