Skip to content

DOC: Fix encoding of docstring validation for Windows #25466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 7, 2019

Conversation

kpapdac
Copy link
Contributor

@kpapdac kpapdac commented Feb 27, 2019

In Windows, the validate_docstrings.py script fails because an encoding error. It has been fixed here.

PR done in the London python sprints meetup.

CC: @datapythonista

@codecov
Copy link

codecov bot commented Feb 27, 2019

Codecov Report

Merging #25466 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25466      +/-   ##
==========================================
+ Coverage   91.74%   91.74%   +<.01%     
==========================================
  Files         173      173              
  Lines       52923    52923              
==========================================
+ Hits        48554    48555       +1     
+ Misses       4369     4368       -1
Flag Coverage Δ
#multiple 90.31% <ø> (ø) ⬆️
#single 41.73% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/util/testing.py 87.66% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe1654f...639302e. Read the comment docs.

@codecov
Copy link

codecov bot commented Feb 27, 2019

Codecov Report

Merging #25466 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25466      +/-   ##
==========================================
- Coverage   91.98%   91.98%   -0.01%     
==========================================
  Files         175      175              
  Lines       52374    52374              
==========================================
- Hits        48178    48174       -4     
- Misses       4196     4200       +4
Flag Coverage Δ
#multiple 90.53% <ø> (ø) ⬆️
#single 40.72% <ø> (-0.15%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94c8c94...5edb1e8. Read the comment docs.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case for the behavior?

@WillAyd WillAyd added Docs CI Continuous Integration Windows Windows OS labels Feb 27, 2019
@kpapdac
Copy link
Contributor Author

kpapdac commented Feb 28, 2019

running

.\scripts\validate_docstrings.py --errors=GL08 --format=json

returns

UnicodeEncodeError: 'charmap' codec can't encode character '\u2155' in position 254: character maps to <undefined>

@WillAyd
Copy link
Member

WillAyd commented Feb 28, 2019

Sure but we need automated tests. You can place one in scripts/tests/test_validate_docstrings.py

@jreback jreback added this to the 0.25.0 milestone Mar 3, 2019
@WillAyd
Copy link
Member

WillAyd commented Mar 6, 2019

@kpapdac can you merge master to resolve build failure and add a test for this?

@kpapdac
Copy link
Contributor Author

kpapdac commented Mar 6, 2019

yes, i'm on it!

@kpapdac
Copy link
Contributor Author

kpapdac commented Mar 7, 2019

ok, i gave it a go, i've never done this before but my best guest is to add to the scripts/tests/test_validate_docstrings.py in TestDocstringClass class the following

    @pytest.mark.parametrize('name', 'pandas.Series.str.isdecimal')
    def test_exit_status_for_write_validation_results_to_json(self, name):
        msg = 'UnicodeEncodeError in "{}"'.format(name)
        with pytest.raises(UnicodeEncodeError, match=msg):
            validate_docstrings.Docstring.validate_pep8(name)

For this specific function the test should fail as there is an encoding issue. I doubt that it works though as it passes the test without making any change to scripts\validate_docstrings.py. I have no idea how to move on with this, any help appreciated!

@RjLi13
Copy link
Contributor

RjLi13 commented Mar 10, 2019

Hey @kpapdac sorry if I'm misunderstanding, but shouldn't your written test pass if you make no change to the scripts\validate_docstrings.py since it raises an UnicodeEncodeError on Windows? I think your test should be checking if UnicodeEncodeError does not get raised when calling the validate_pep8 function on Windows.

@@ -543,8 +543,8 @@ def validate_pep8(self):
application = flake8.main.application.Application()
application.initialize(["--quiet"])

with tempfile.NamedTemporaryFile(mode='w') as file:
file.write(content)
with tempfile.NamedTemporaryFile(mode='wb') as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer opening the NamedTemporaryFile with encoding='utf-8'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you address comment from @TomAugspurger here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it as @TomAugspurger suggested.

@kpapdac
Copy link
Contributor Author

kpapdac commented Mar 18, 2019

Hi @RjLi13 thank you for having a look at this and apologies for the delay. I think you're right. I changed the test to

    def missing_encoding_write_to_file(self):
        """
        Examples
        --------
        >>> try:
        ...   docstr = validate_docstrings.Docstring('pandas.Series.str.isdecimal')
        ...   result = docstr.validate_pep8()
        ...   next(result)
        ...   print(1)
        ... except:
        ...   0
        1
        """
        pass

so as to fail when the exception is raised. Now, when I change scripts\validate_docstrings.py as per the pull request, the test passes the file.write(content.encode('utf-8')) part but somehow fails on application.run_checks([file.name]). It gives me an Valueerror write to closed file. Do you have any idea what's going wrong here? I appreciate your help!

@RjLi13
Copy link
Contributor

RjLi13 commented Mar 19, 2019

@kpapdac Push what you have so we can take a look at the code.

@jreback
Copy link
Contributor

jreback commented Mar 19, 2019

can you merge master

kpapdac added a commit to kpapdac/pandas that referenced this pull request Mar 20, 2019
@kpapdac
Copy link
Contributor Author

kpapdac commented Mar 20, 2019

@jreback I did merge master 3 days ago, there should be no conflict now.
@RjLi13 I also pushed my changes in test_validate_docstrings.py

@kpapdac
Copy link
Contributor Author

kpapdac commented Mar 20, 2019

@jreback just to clarify I merged and pushed encode_error branch to my master and got a successful built from Travis. Is there anything else i should do to resolve this? Thanks

@WillAyd
Copy link
Member

WillAyd commented Apr 10, 2019

@kpapdac your comment above suggests you did but I don't see a test as part of this. Can you double check you included that?

@kpapdac
Copy link
Contributor Author

kpapdac commented Apr 14, 2019

hi @WillAyd, I added a test for the encoding issue in test_validate_docstrings.py in my master. The name of the test is test_encode_content_write_to_file.
Also apologies, I created a new pull request for this by mistake, so please ignore.
I'm a new contributor so please let me know if something doesn't make sense.

@WillAyd
Copy link
Member

WillAyd commented Apr 14, 2019

@kpapdac you'll need to add that test to this PR. Make it locally on your encode_error branch then do the following:

git fetch upstream
git merge upstream/master
git push origin encode_error

It should update this PR with your test

@pep8speaks
Copy link

pep8speaks commented Apr 14, 2019

Hello @kpapdac! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-05-06 19:50:57 UTC

@kpapdac
Copy link
Contributor Author

kpapdac commented Apr 16, 2019

thanks @WillAyd , does it now look ok?

@@ -6,7 +6,7 @@
import numpy as np
import pandas as pd

import validate_docstrings
from scripts import validate_docstrings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this is necessary - can you revert this (or let me know what I am missing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, my bad i wans't working for me when debugging, i reverted this.

@@ -1052,6 +1052,11 @@ def test_raises_for_invalid_attribute_name(self, invalid_name):
with pytest.raises(AttributeError, match=msg):
validate_docstrings.Docstring(invalid_name)

@pytest.mark.parametrize('name', ['pandas.Series.str.isdecimal'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only using one value no need to parametrize - just use this locally in function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end I added a second test case to check.

@@ -543,8 +543,8 @@ def validate_pep8(self):
application = flake8.main.application.Application()
application.initialize(["--quiet"])

with tempfile.NamedTemporaryFile(mode='w') as file:
file.write(content)
with tempfile.NamedTemporaryFile(mode='wb') as file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you address comment from @TomAugspurger here?

@@ -5,7 +5,6 @@
import pytest
import numpy as np
import pandas as pd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also revert this? Probably fails isort or flake8 without it

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor stuff. So both of the parametrized values were failing on windows before correct? Can't reproduce myself so just want to confirm the choice

@@ -1052,6 +1052,11 @@ def test_raises_for_invalid_attribute_name(self, invalid_name):
with pytest.raises(AttributeError, match=msg):
validate_docstrings.Docstring(invalid_name)

@pytest.mark.parametrize('name', ['pandas.Series.str.isdecimal', 'pandas.Series.str.islower'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think our CI is hitting this directory but this would fail linting for being too long. Can you break lines here after the opening left bracket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they both fail on windows having characters like '³', '⅕'. I'll break the lines.

@@ -1052,6 +1052,11 @@ def test_raises_for_invalid_attribute_name(self, invalid_name):
with pytest.raises(AttributeError, match=msg):
validate_docstrings.Docstring(invalid_name)

@pytest.mark.parametrize('name', ['pandas.Series.str.isdecimal', 'pandas.Series.str.islower'])
def test_encode_content_write_to_file(self, name):
docstr = validate_docstrings.Docstring(name).validate_pep8() # GH25466
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move comment one line up (OK to be standalone)

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - @datapythonista any chance you can take a look at this one?

@@ -1052,6 +1052,12 @@ def test_raises_for_invalid_attribute_name(self, invalid_name):
with pytest.raises(AttributeError, match=msg):
validate_docstrings.Docstring(invalid_name)

@pytest.mark.parametrize('name', [
'pandas.Series.str.isdecimal', 'pandas.Series.str.islower'])
def test_encode_content_write_to_file(self, name): # GH25466
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be on the next line. @kpapdac if you want to fix up otherwise can clean up before merging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm sorry, you said that above but I didn't get it..I'll fix it.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, added couple of comments, but happy to get this merged as it is too

Thanks @kpapdac

@kpapdac
Copy link
Contributor Author

kpapdac commented Apr 27, 2019

thank you @datapythonista, @WillAyd for all the help

@WillAyd
Copy link
Member

WillAyd commented May 3, 2019

@kpapdac can you address comments from @datapythonista ? Should be able to get this in thereafter

@WillAyd WillAyd merged commit ca1a36a into pandas-dev:master May 7, 2019
@WillAyd
Copy link
Member

WillAyd commented May 7, 2019

Thanks @kpapdac !

@kpapdac
Copy link
Contributor Author

kpapdac commented May 7, 2019

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Docs Windows Windows OS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants