ENH read_excel error when accessing AWS S3 URL

Summary: read_excel is unable to read a file using the same S3 URL syntax as read_csv.  read_excel should support accessing S3 data in the same manner as read_csv

read_excel fails with the following error:

``` python
>>> import pandas as pd
>>> df = pd.read_excel("s3://my-bucket/my_file.xlsx")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib64/python2.6/site-packages/pandas/io/excel.py", line 163, in read_excel
    io = ExcelFile(io, engine=engine)
  File "/usr/local/lib64/python2.6/site-packages/pandas/io/excel.py", line 206, in __init__
    self.book = xlrd.open_workbook(io)
  File "/usr/local/lib/python2.6/site-packages/xlrd/__init__.py", line 394, in open_workbook
    f = open(filename, "rb")
IOError: [Errno 2] No such file or directory: 's3://my-bucket/my_file.xlsx'
>>> 
```

read_csv on the other hand is able to successfully read a csv file in the same S3 bucket using the same URL syntax:

``` python
>>> import pandas as pd
>>> df = pd.read_csv("s3://my-bucket/my_file.csv")
>>> len(df.index)
1187
>>>
```

For the record, read_csv can also see the xlsx file but returns parse errors when attempting to tokenize the data.

``` python
>>> import pandas as pd
>>> df = pd.read_csv("s3://my-bucket/my_file.xlsx")
Exception pandas.parser.CParserError: CParserError('Error tokenizing data. C error: Expected 9 fields in line 210, saw 10\n',) in 'pandas.parser.TextReader._tokenize_rows' ignored
>>> 
```

read_excel successfully reads and parses a local copy of the xlsx file

``` python
>>> import pandas as pd
>>> df = pd.read_excel("my_file.xlsx")
>>> len(df.index)
221
>>> 
```

Pandas version string and dependencies:

``` python
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.6.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.14.48-33.39.amzn1.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0
nose: 1.3.4
pip: 6.1.1
setuptools: 12.2
Cython: None
numpy: 1.10.1
scipy: 0.16.0
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
>>> 
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH read_excel error when accessing AWS S3 URL #11447

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH read_excel error when accessing AWS S3 URL #11447

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions