Skip to content

TST: Refactor test_parsers.py #12964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Apr 23, 2016

Refactored tests in test_parsers.py to increase coverage of the different types of parsers and remove nearly duplicate testing in some cases.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 23, 2016

Not sure if this is all the refactoring that needs to be done, but I think I got through a good portion of it. Further suggestions are welcome!

@gfyoung gfyoung force-pushed the test-parsers-refactor branch from 9846686 to 0a7f5e6 Compare April 23, 2016 03:09
@jreback jreback added Testing pandas testing functions or related to the test suite IO CSV read_csv, to_csv labels Apr 23, 2016
@gfyoung gfyoung force-pushed the test-parsers-refactor branch from 0a7f5e6 to ecf4cde Compare April 23, 2016 20:25
@jreback
Copy link
Contributor

jreback commented Apr 25, 2016

changing things like this:

# see gh-10022 are just not useful. you are the only one who uses this syntax AFAICT. leave them.

@jreback
Copy link
Contributor

jreback commented Apr 25, 2016

it is really quite difficult to tell what you did and comment on it.

I would split this up into multiple files (test_parsers) to make this more easily grokable.

e.g.

tests/parsers/....

@gfyoung
Copy link
Member Author

gfyoung commented Apr 25, 2016

# see gh-xxxx is standard from other open source work I have done because it clearly labels that the issue was brought up on GitHub (as it may have migrated from other hosting systems as this one was for example).

Ah, yes, unfortunately, I did not realize how much I had changed until I saw that GitHub could not show the diff at all. Also, your code block is blank if you were trying to demonstrate something...

@jreback
Copy link
Contributor

jreback commented Apr 25, 2016

@gfyoung what I was saying (I edited), was the test_parsers could/should just be splt up into multiple files under pandas/io/tests/parsers/ (like did for things like index/series/frame etc. That way easier to read/grok/diff. Don't 'change' things as much as reorg, then in a follow-on you can change. Doing 2 at once is very confusing.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 25, 2016

Hmmm...I'll see what I can do to break up the tests. It's not as straightforward as it is in frame.py because of that ParserTests monolith. Suggestions on how to / where to break it up (based on what you see in master)?

@gfyoung
Copy link
Member Author

gfyoung commented Apr 25, 2016

Also, this PR is pretty much just reorganization and removal of duplicate tests. If you can view it, there are no more tests under TestCParserHighMemory or TestCParserLowMemory, as they have all been moved in CParserTests and ParserTests.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 25, 2016

Also, one more thing: why do we have test_cparser.py? That to me is very confusing when we already have test_parsers.py.

@jreback
Copy link
Contributor

jreback commented Apr 25, 2016

@gfyoung my point is this is very hard to see. Let's break up first. Yes no realy need for a separate test_cparser.py, those could be folded.

This IS straightforward to breakup if you just have different test classes rather than a single monolith. The idea is to test different options in different files. Perfect example is compressions which is alredy split out. But you can split out parse_date testing. line terminator testsing etc. Lots of ways to do this.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 25, 2016

@jreback : No, I perfectly understand (I realized that it might be a problem when I saw GitHub couldn't show it)! I was just trying to get some feedback first on how to break up test_parsers.py. I'll see what I can see with this.

@gfyoung gfyoung force-pushed the test-parsers-refactor branch from ecf4cde to d439b47 Compare April 28, 2016 06:52
@codecov-io
Copy link

codecov-io commented Apr 28, 2016

Current coverage is 83.92%

Merging #12964 into master will decrease coverage by -0.13%

@@             master     #12964   diff @@
==========================================
  Files           136        136          
  Lines         49994      49918    -76   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          42028      41889   -139   
- Misses         7966       8029    +63   
  Partials          0          0          
  1. 2 files (not in diff) in pandas/tools were modified. more
    • Hits -22
  2. 9 files (not in diff) in pandas/core were modified. more
    • Misses +66
    • Hits -101
  3. 2 files (not in diff) in pandas were modified. more
    • Misses -3
    • Hits -16

Powered by Codecov. Last updated by 84725fa...95c52b8

@gfyoung gfyoung force-pushed the test-parsers-refactor branch from d439b47 to b886475 Compare April 28, 2016 12:08
@@ -0,0 +1,471 @@
"""
Tests date parsing functionality for all of the
Copy link
Contributor

@jreback jreback Apr 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas/io/tests/parser/test_dates.py should be the name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then won't that file be run as a test suite? I don't want that (it needs to be imported as a test class).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well you need to fix it then if that's the way you want it. the correct way to do this is to split off the base class into common.py or something which is then imported into each file. Look at how tests/indexes was done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, I see. I'll take a look into that then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky part (compared to tests/indexes) is that there are really "three" base classes (CParserHighMemory, CParserLowMemory, PythonParser). How does one organize that then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea is then you can easily strip out related things and add as separeat files in a nice way. (and don't crows the namespace)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the main Base classes mostly empty. Then just sub-class them.

Copy link
Member Author

@gfyoung gfyoung Apr 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH it seems easier to work in the way of making the individual test suites part of a common package and sub-class them into each of the main parser tests as is done now. Otherwise, I fear there will be duplicate code IINM. For example, parse_date_tests.py needs to be run for each of three parsers. But I can't just throw them all in as super-classes because they will clash. So I would then have to make three different versions, one for each parser, which I don't think is preferable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpliest easiest is best. you can also make them mix-in's might be easier. but still keep in separate files (just don't name then test_*)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I'll try to first separate them out and then we can see what is best afterwards (still WIP with pulling out test suites from test_parsers)

@gfyoung gfyoung force-pushed the test-parsers-refactor branch 2 times, most recently from 15f50f2 to 6a26224 Compare April 28, 2016 16:31
@@ -0,0 +1,155 @@
# -*- coding: utf-8 -*-

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think naming this compression.py is better because then its not runable on its own (its only mixins to others). name test_* if they are runnable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gfyoung gfyoung force-pushed the test-parsers-refactor branch 2 times, most recently from bfbbed2 to 4f9c911 Compare April 28, 2016 20:22
@jreback jreback added this to the 0.18.1 milestone Apr 28, 2016
# -*- coding: utf-8 -*-

"""
Tests parsers ability to read and parse non-local files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so some of the files are not in the pandas/io/tests/parser/* namespace e.g.
pandas/io/tests/test_network.py as an example.

[jreback-~/pandas] git diff master --stat
 pandas/io/tests/parser/__init__.py        |    0
 pandas/io/tests/parser/compression.py     |  155 +++++
 pandas/io/tests/parser/parse_dates.py     |  468 +++++++++++++++
 pandas/io/tests/parser/test_parsers.py    | 3342 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 pandas/io/tests/parser/test_textreader.py |  402 +++++++++++++
 pandas/io/tests/test_cparser.py           |  401 -------------
 pandas/io/tests/test_network.py           |  195 +++++++
 pandas/io/tests/test_parsers.py           | 5055 --------------------------------------------------------------------------------------------------------------------------------------------------------------
 pandas/io/tests/test_read_fwf.py          |  337 +++++++++++
 pandas/io/tests/test_unsupported.py       |   88 +++
 pandas/tests/test_lib.py                  |   74 ++-

Copy link
Member Author

@gfyoung gfyoung Apr 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, I'll move all of those (excluding test_lib of course) into the parser directory as well

@jreback
Copy link
Contributor

jreback commented Apr 28, 2016

I think this also needs an edit to setup.py

e.g. add pandas.io.tests.parsers (also add pandas.io.tests.sas) which is missing. (pandas.io.tests.json) is already there.

you can do a

python setup.py sdist to make sure that the tests are generated (when not from a clone)

@gfyoung gfyoung force-pushed the test-parsers-refactor branch from 4f9c911 to edba8fa Compare April 29, 2016 00:31
assert_same_values_and_dtype(result, expected)


def assert_same_values_and_dtype(res, exp):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put at the top

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gfyoung gfyoung force-pushed the test-parsers-refactor branch 5 times, most recently from fc49f1a to 818f711 Compare April 29, 2016 16:50
@jreback
Copy link
Contributor

jreback commented Apr 29, 2016

something wrong. lots of tests not running

[jreback-~/pandas] nosetests  pandas/io/tests/parser/
.............S...............S..........................
----------------------------------------------------------------------
Ran 56 tests in 15.089s

OK (SKIP=2)

but if its explict it works

[jreback-~/pandas] nosetests pandas/io/tests/parser/test_parsers.py 
...........................................S.........................................................................................................................................S............................................S.........................................................................................................................................S..S.............................................................................................................................................................S.
----------------------------------------------------------------------
Ran 527 tests in 47.291s

OK (SKIP=6)

I think that test_parsers.py permissions are messing this up, they should be 644 (but you have to force the checking)

[jreback-~/pandas] ls -ltr pandas/io/tests/*.py
-rw-rw-r--  1 jreback  staff    4757 Apr  5 19:04 pandas/io/tests/test_wb.py
-rw-rw-r--  1 jreback  staff    7936 Apr  5 19:04 pandas/io/tests/test_ga.py
-rw-rw-r--  1 jreback  staff    5892 Apr  5 19:04 pandas/io/tests/test_date_converters.py
-rw-rw-r--  1 jreback  staff    2755 Apr  5 19:04 pandas/io/tests/test_common.py
-rw-rw-r--  1 jreback  staff    4012 Apr  5 19:04 pandas/io/tests/test_clipboard.py
-rw-rw-r--  1 jreback  staff    9126 Apr  5 19:04 pandas/io/tests/generate_legacy_storage_files.py
-rw-rw-r--  1 jreback  staff       0 Apr  5 19:04 pandas/io/tests/__init__.py
-rw-rw-r--  1 jreback  staff   48615 Apr 27 09:48 pandas/io/tests/test_stata.py
-rw-rw-r--  1 jreback  staff    7505 Apr 27 09:48 pandas/io/tests/test_pickle.py
-rw-rw-r--  1 jreback  staff   20329 Apr 27 09:48 pandas/io/tests/test_data.py
-rw-rw-r--  1 jreback  staff  105456 Apr 28 13:21 pandas/io/tests/test_sql.py
-rw-rw-r--  1 jreback  staff   29769 Apr 28 13:21 pandas/io/tests/test_html.py
-rw-rw-r--  1 jreback  staff  202091 Apr 28 16:38 pandas/io/tests/test_pytables.py
-rw-rw-r--  1 jreback  staff   30648 Apr 28 16:38 pandas/io/tests/test_packers.py
-rw-rw-r--  1 jreback  staff   34349 Apr 28 16:38 pandas/io/tests/test_gbq.py
-rw-rw-r--  1 jreback  staff   84066 Apr 28 16:38 pandas/io/tests/test_excel.py
-rwxrwxr-x  1 jreback  staff  184802 Apr 29 12:59 pandas/io/tests/test_parsers.py
-rw-rw-r--  1 jreback  staff   12631 Apr 29 12:59 pandas/io/tests/test_cparser.py

@gfyoung gfyoung force-pushed the test-parsers-refactor branch from 818f711 to 225637f Compare April 29, 2016 17:06
@gfyoung
Copy link
Member Author

gfyoung commented Apr 29, 2016

@jreback : I think you're right about that (at least that's what Google says). I changed the mode of test_parsers.py to 644. That should work now hopefully.

@gfyoung gfyoung force-pushed the test-parsers-refactor branch from 225637f to 20389b1 Compare April 29, 2016 17:13
@jreback
Copy link
Contributor

jreback commented Apr 29, 2016

I think need to add io/tests/parser to ci/lint.sh (and fix common.py). pls add io/tests/sas and io/tsests/json as well (ideally it would recursively descend .........)

@gfyoung
Copy link
Member Author

gfyoung commented Apr 29, 2016

flake8 is recursive AFAICT. I just tried doing flake8 pandas/io --filename '*.py' and it spewed out all of these errors with common.py, so adding them is not necessary.

@jreback
Copy link
Contributor

jreback commented Apr 29, 2016

ci/lint.sh is NOT though. we are specifically checking certain dirs. so we can change this to cover cases like this maybe.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 29, 2016

Huh? What do you mean? That's the exact command ci/lint.sh will run.

@jreback
Copy link
Contributor

jreback commented Apr 29, 2016

so why didn't your build fail then? common.py is NOT checked (well not on travis).

@gfyoung
Copy link
Member Author

gfyoung commented Apr 29, 2016

I haven't pushed common.py to Travis yet. I put [ci skip] explicitly for that reason. I'm trying to see if I can extract some more test classes from that module before I push again to Travis.

@jreback
Copy link
Contributor

jreback commented Apr 29, 2016

oh, ok, then pls push. I think this is ready as is. (pls flake check)

@gfyoung gfyoung force-pushed the test-parsers-refactor branch from 20389b1 to c01b4bd Compare April 29, 2016 17:40
Refactored tests in test_parsers.py
to increase coverage of the different
types of parsers and remove nearly
duplicate testing in some cases.
@gfyoung gfyoung force-pushed the test-parsers-refactor branch from c01b4bd to 3ed01a0 Compare April 29, 2016 22:33
@gfyoung
Copy link
Member Author

gfyoung commented Apr 30, 2016

@jreback : Travis is giving the green light. Not sure what's going on with Codecov though. Is this good to merge?

@jreback
Copy link
Contributor

jreback commented Apr 30, 2016

hmm, yeah the codecov that is looking odd

@gfyoung
Copy link
Member Author

gfyoung commented Apr 30, 2016

So how to proceed? Can this be merged? Or is it a bug in codecov.yml? Something else? Similar Codecov strangeness is also happening in my fromnumeric PR FYI.

@jreback
Copy link
Contributor

jreback commented Apr 30, 2016

yeah will take another look. if you have insite into codecov, then ok!

@gfyoung
Copy link
Member Author

gfyoung commented Apr 30, 2016

Unfortunately, I do not :< . Let me know once you have more info!

@jreback jreback closed this in 9b030a1 Apr 30, 2016
@jreback
Copy link
Contributor

jreback commented Apr 30, 2016

thank you sir!

@jreback
Copy link
Contributor

jreback commented Apr 30, 2016

I had to do this: 879f217

to make the setup work in a clean env. Travis works from a clone so the installation is slightly different (it depends on setup).

Side issue, this came up because it was segfaulting on test_memory_map when the file didn't exist.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 30, 2016

Ah, I see. Thanks!

@gfyoung gfyoung deleted the test-parsers-refactor branch April 30, 2016 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants