Skip to content

ENH: really really really fix the failing data.py tests #4085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 1, 2013
Merged

ENH: really really really fix the failing data.py tests #4085

merged 1 commit into from
Jul 1, 2013

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Jun 29, 2013

I feel like a broken record

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

@jtratner why wasn't (doesn't?) the network decorator catch the socket.error exceptions?

@ghost ghost assigned cpcloud Jun 29, 2013
@jtratner
Copy link
Contributor

I thought you'd added that to the decorator? Definitely wasn't there at first because I didn't anticipate it. That said, it's a subclass of IOError right?

@jtratner
Copy link
Contributor

(maybe you had added it in a PR that didn't get merged bc you went another way) but I swear we had this discussion before :P

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

we did, but i never figured out what the issue was. i don't know wtf i was thinking. it's not there precisely because it is a subclass of IOError so ostensibly there's no need to catch it if u catch IOError. i'm going to catch them inside functions talking to the network and see if that helps

@jtratner
Copy link
Contributor

@cpcloud do you have an example build that failed with socket.error? At one point, a function seemed to be catching socket.error and then not outputting anything, thereby causing other errors later on.

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

let me see if i can find one

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

@jtratner for some reason the travis api only lets you look at the most recent 25 builds (which are the ones that show up when you look at them) and i'm not going to go through each errored build to see where the socket error was happening. i'll see if i can get more than the just the most recent

@jtratner
Copy link
Contributor

maybe you linked to it in an earlier issue? I'll search around if I have time later. That said, probably won't be that important - just need to reproduce it consistently. What about trying something like:

for i in range(10000):
    time.sleep(0.001)
    msg = str(i)
    try:
        run_network_test_func
    except IOError:
        print(msg + "caught IOError")
    except Exception as e:
        raise

This would at least help to see whether there's something other than IOError that crops up consistently.

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

thing is all network errors are subclasses of IOError so if it's a network error it should be raising a subclass of IOError or IOError itself, hence the frustration/mystery.

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

that said i will try what you've posted

@jtratner
Copy link
Contributor

idea: Maybe it's raising it in the cleanup, which somehow happens after
the function returns (and therefore passes the @network check), but
before the test case completes?

On Sat, Jun 29, 2013 at 6:18 PM, Phillip Cloud notifications@github.comwrote:

that said i will try what you've posted


Reply to this email directly or view it on GitHubhttps://github.com//pull/4085#issuecomment-20238211
.

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

nice i will check that

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

could be the fact that the closing is where the issue lies....python 3 builds it into the class so you might be right

@jtratner
Copy link
Contributor

@cpcloud it's just weird that this would happen, because the entire idea of a context manager is that when you pass out of scope of the block, it should clean everything up.

You could also do something like this to avoid the double closing in Python 3:

if py3compat.PY3:
    compatible_urlopen = urllib2.urlopen
else:
    @contextmanager
    def compatible_urlopen(*args, **kwargs):
        with closing(urllib2.urlopen(*args, **kwargs)) as f:
            yield f

# later on
with compatible_urlopen(url) as f:
     f.read()

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

that might be the way to go

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

i just wish i could have a way to repro it so that i could be sure that i'm fixing the problem and not adding more stuff to maintain

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

hm well after 500 iterations no exceptions thrown by get_data_yahoo...

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

probably not the best choice tho now i see that that's where ioerrors are caught, but hen again that's somewhat reassuring

@jtratner
Copy link
Contributor

@cpcloud sorry :-/ I thought that potentially the issue was too many
connections being open...

On Sat, Jun 29, 2013 at 6:44 PM, Phillip Cloud notifications@github.comwrote:

probably not the best choice tho now i see that that's where ioerrors are
caught, but hen again that's somewhat reassuring


Reply to this email directly or view it on GitHubhttps://github.com//pull/4085#issuecomment-20238563
.

@cpcloud
Copy link
Member Author

cpcloud commented Jun 29, 2013

it's ok

@cpcloud
Copy link
Member Author

cpcloud commented Jun 30, 2013

@jreback @jtratner u guys want to look over this? Going to merge after passing

I ran this on travis a bunch of times overnight (before the most recent commit) to see if i could trigger any errors, and nada. (only changes in the latest commit were refactoring)

Here's a list of the important changes (in no particular order):

  • version independent (down to python 2.6) context managers for ZipFile and urlopen (thanks @jtratner!)
  • refactored all repetitive code into single functions, e.g., _retry_read_url, _dl_mult_symbols
  • NaN is now returned when a symbol cannot be found in _dl_mult_symbols, so that the shape remains the same, a la the GIGO principle (thanks @jreback)
  • I've added a test that checks to make sure that if any symbols failed the column with that symbol name contains all NaNs and also checks that matches up exactly with the symbol names in the warning message.
  • unified the API such that kwargs is no longer used anymore as there was no need for it
  • things that should be skipped are now skipped, I see the skips occasionally on my machine we'll see about travis

@jreback
Copy link
Contributor

jreback commented Jun 30, 2013

looks like a nice cleanup....when you have a chance, can you add to the wiki how to use ZipFile/urlopen (in testing and reality)?

@cpcloud
Copy link
Member Author

cpcloud commented Jun 30, 2013

added to wiki

@jtratner
Copy link
Contributor

I can't look at this right now, but earlier versions looked good. +1 on the
cleanup.
On Jun 30, 2013 3:48 PM, "Phillip Cloud" notifications@github.com wrote:

added to wiki


Reply to this email directly or view it on GitHubhttps://github.com//pull/4085#issuecomment-20253998
.

@cpcloud
Copy link
Member Author

cpcloud commented Jul 1, 2013

3...2...1...

cpcloud added a commit that referenced this pull request Jul 1, 2013
ENH: really really really fix the failing data.py tests
@cpcloud cpcloud merged commit 40be825 into pandas-dev:master Jul 1, 2013
@cpcloud cpcloud deleted the data-dot-py-cleanup branch July 1, 2013 01:29
@nehalecky
Copy link
Contributor

Thanks for cleaning this up, this PR looks great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants