-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
DOC: Fix EX02 in pandas.Series.factorize #32901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Fix EX02 in pandas.Series.factorize #32901
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Unfortunately, this is failing CI. See my inline comment
pandas/core/algorithms.py
Outdated
@@ -556,7 +556,7 @@ def factorize( | |||
|
|||
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b']) | |||
>>> codes | |||
array([0, 0, 1, 2, 0]) | |||
array([0, 0, 1, 2, 0], dtype=int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether this is displayed by numpy depends on your machine / OS, I think (eg on my linux laptop, it is not displayed). So we cannot simply add it. If we want to have the doctests pass locally in such cases, it probably needs some ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @jorisvandenbossche!
Could you elaborate more on ...
?
So actually it is not really problem because the difference arises due to OS running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's still a problem for people who want to run it locally and then see errors (like you did I suppose?).
About the ..., see https://docs.python.org/3/library/doctest.html#doctest.ELLIPSIS
I am not fully sure how we deal with this in other places (cc @datapythonista ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's surprising that numpy changes the __repr__
depending on the platform. I would report it to them, may be there is a good reason, or may be it's just some old code.
For fixing our doctrsts, I'd check if the same happens with __str__
, and use print
if it doesn't. Otherwise, I'd simply add an ignore. It's not a great solution, since it's shown to the user, but it's what we've been doing. Better than leave failing tests IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a well known issue with numpy and doctests (I think, not fully sure, but eg in scikit-learn we regularly ran into similar issues as well, now I think back of it). AFAIU, it specifically comes up here because it are ints (and if you are on a platform where int64 is not the default, it shows the dtype).
Using
array([0, 0, 1, 2, 0]...)
should work I think. @farhanreynaldo can you try if what this it passes locally for you?
Using print(..)
is indeed another option (print never shows the dtype).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the changes using ellipsis but the test still throws an error. After doing a quick look, it seems like we need to enable optionflags=doctest.ELLIPSIS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add that to doctest_optionflags
in setup.cfg, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct, if you update this PR with the changes to setup.cfg
and adding the ...
, we can have a look and give it a try. But seems like it should be working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this works locally, but I am also on a 64bit machine, so that might not be a good indication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, and CI is green. Is this working for you locally @farhanreynaldo ?
Yes, it's working now. |
Thanks @farhanreynaldo ! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Related to #27977.
output of
python scripts/validate_docstrings.py pandas.Series.factorize
: