-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
TST: Change test_html to use stored data + mark other #4009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Change test_html to use stored data + mark other #4009
Conversation
thanks must have missed that when i wrote them. i think slow should be removed since either slow or network tests get run on travis iirc |
also there's already |
Okay, I'll make that change. Do you want to store data locally for these or leave with the network request? html file is 32K |
the banklist data set for example is updated every monday and so it's basically frozen in time and you cannot compare the data obtained with that that is stored locally. i get 28k (for spam) but that's splitting hairs... probably ok to leave local |
i did |
Huh? Currently it's not local for the html file test. (most of the other test cases are). Plus these are basically just checks that lxml and BeautifulSoup actually work as expected. |
1 similar comment
Huh? Currently it's not local for the html file test. (most of the other test cases are). Plus these are basically just checks that lxml and BeautifulSoup actually work as expected. |
it's ok to change to local if u want that's what i meant |
Sorry for the double comment there - poor connection. The html file that the lxml and bs4 tests check for is 32k. |
Anyways I don't have a strong opinion. I'll edit this to remove slow soon. |
oh u mean the one u added? or the one that is received by |
@cpcloud yeah, that does. just wondering if you think it's worth it - sorry I wasn't clear earlier |
Since these are essentially smoke tests for the backend you're right best to move to slow decorators and remove the network decorators. |
one big reason why these take so long is that they are processing a huge On Mon, Jun 24, 2013 at 9:32 AM, Phillip Cloud notifications@github.comwrote:
|
wanted to test on possibly invalid html from the wild. if you can find a smaller file that works that's great too |
@cpcloud hmm...well, just going to leave it as slow I guess. Sort of a bummer that it can't be both. |
if you come across another data set in the wild please put up a PR, would love to have both just don't have time to check right now for data sets from pages that are both small and rife with HTML errors. this scraping framework might be of use for finding all kinds of different HTML tables. |
@cpcloud Changed double decorated tests previously |
i'm confused..why not just use the spam data set that's already there? |
@cpcloud ohh, good point. Somehow I thought there was some decision to use a different file. |
changed it. |
url = ('http://ndb.nal.usda.gov/ndb/foods/show/1732?fg=&man=&' | ||
'lfacet=&format=&count=&max=25&offset=&sort=&qlookup=spam') | ||
assert get_lxml_elements(url, 'tbody') | ||
filepath = "file://" + os.path.join(DATA_PATH, "spam.html") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the explicit file protocol necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes...doesn't work otherwise. (because it gets passed directly to the functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you know what can u change the get_lxml_elements
to take a kwarg called base_url='file://'
then pass that to the parse call as base_url
like so: parse(url, base_url=base_url)
then you don't have to concat with strings
@jreback look ok? i think this can be merged |
looks ok |
Does this need rebase? |
@cpcloud more fun with network failures! :D https://travis-ci.org/jtratner/pandas/jobs/8559725 [pretty sure this is off of master and not due to this commit] |
darn. that is off of master. i'm not even sure what the issue is there. i'll delve deeper. i really wish i had a way to reproduce this stuff. i may end up just setting up my own CI on my machines at home |
@cpcloud Maybe you could set up something that tracks all connections open at any one time (maybe mod urlopen to do it?) then, when it raises an error, have it print the entire set of open network connections. |
there is no hope. will just catch this exception...teh interwebz r tricky |
@cpcloud that seems like the best bet here. Given the confusing-ness of the error, definitely something that the datareaders should handle themselves. |
badstatusline exceptions are just like throwing |
I don't know what you are talking about. BadStatusLine exceptions indicate On Sat, Jun 29, 2013 at 10:58 AM, Phillip Cloud notifications@github.comwrote:
|
ha. they are so specific with their empty string error message too |
@cpcloud can this (or something like it) get merged soon? These tests failed loudly when no internet available... |
Sure maybe later tonight if not early tmrw morning |
@jtratner can u rebase? once travis passes i can merge. sorry this fell by the wayside a bit. |
@cpcloud I thought this was handled by your data.py edit, which is why I wasn't thinking about it - didn't even notice anything until I had the internet turned off :P |
No problem. I discovered yet another failure in data.py but I'm waiting to see if can reproduce it before making an issue. I feel like there has to be a better--meaning more reliable--approach. I wonder if yahoo and or Google has an api for this stuff. More so than just urlopening the CSV. |
+ Change up test decorators so no tests marked as network and slow.
rebased ftw |
TST: Change test_html to use stored data + mark other
thanks! |
Changed double decorated tests previously
network
andslow
so they actually get run + stored usda_data such that the tests don't fail on network access.