Skip to content

ENH: Add support for s3 in read_excel #11447 #11714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 1, 2015

Conversation

parsleyt
Copy link
Contributor

closes #11447

Updated excel.py to use pandas.io.common.get_filepath_or_buffer (handles s3 urls) instead of urlopen, only when _is_s3_url returns true. This was the minimally invasive change.

@jreback
Copy link
Contributor

jreback commented Nov 27, 2015

tests pls

@jreback
Copy link
Contributor

jreback commented Nov 29, 2015

pls add a test for this. you can post an .xls file and I will put it up on the pandas s3 bucket.

@jreback jreback added IO Data IO issues that don't fit into a more specific label IO Excel read_excel, to_excel labels Nov 29, 2015
@jreback jreback added this to the 0.18.0 milestone Nov 29, 2015
@jreback
Copy link
Contributor

jreback commented Nov 29, 2015

pls add a whatsnew note as well (enhancements)

@parsleyt
Copy link
Contributor Author

Will do. What's the path to the s3 bucket?
Will use test1.xls and test1.xlsx for testing if you can put them in the bucket for me.

@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

ok I uploaded those 2 files to the standard bucket https://s3.amazonaws.com/pandas-test/test1.xls and .xlsx. lmk if you cannot access them.

@parsleyt
Copy link
Contributor Author

parsleyt commented Dec 1, 2015

Can you go ahead and add test1.xlsm to the s3 bucket as well? Looks like the existing excel tests use all three extensions and I want to use the same base class.

@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

ok, should be good to go.

@parsleyt
Copy link
Contributor Author

parsleyt commented Dec 1, 2015

ok, PR updated with test case and whatsnew note.

@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

perfect. needs a rebase (prob a doc conflict), then ping when green.

@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

also can you extend the doc-string for read_excel (similar to how read_csv is done) for the s3

@parsleyt
Copy link
Contributor Author

parsleyt commented Dec 1, 2015

Looks like the doc for read_excel already mentions s3 in the same way as read_csv. Does this work?
...Valid URL schemes include http, ftp, s3, and file....

@parsleyt
Copy link
Contributor Author

parsleyt commented Dec 1, 2015

Hmm, also looks like I broke the build. The unit test is failing when trying to import boto to handle the s3 url: ImportError: No module named 'boto'

Is there a config I need to update for the CI process to install boto?

@parsleyt
Copy link
Contributor Author

parsleyt commented Dec 1, 2015

Ahh, I see I can just implement a nose.SkipTest. Will add a _skip_if_no_boto()

@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

yep - builds have different deps / versions of things to give code a workout

@parsleyt parsleyt force-pushed the excel-s3 branch 2 times, most recently from aea7f7c to 0b58d56 Compare December 1, 2015 18:01
@parsleyt
Copy link
Contributor Author

parsleyt commented Dec 1, 2015

Good stuff! Ok, I think everything is good to go now. Thanks for walking me through my first contribution.

jreback added a commit that referenced this pull request Dec 1, 2015
ENH: Add support for s3 in read_excel #11447
@jreback jreback merged commit 7d2022a into pandas-dev:master Dec 1, 2015
@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

@parsleyt thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH read_excel error when accessing AWS S3 URL
2 participants