Skip to content

pd.read_fwf fails with file pointer to url #26376

Closed
@leohaim

Description

@leohaim

Code Sample, a copy-pastable example if possible

# Your code here

import urllib.request
import pandas as pd

url='ftp://ftp.ncdc.noaa.gov/pub/data/igra/igra2-station-list.txt'
urllib.request.urlretrieve(url,url.split('/')[-1])
igrainv=pd.read_fwf(url.split('/')[-1],widths=(2,1,3,5,9,10,7,4,30,5,5,7),
                        names=('CC','Network','Code','StationId','Latitude','Longitude','Elev','dummy','StationName','From','To','Nrec'))

f=urllib.request.urlopen(url)
igrainv=pd.read_fwf(f,widths=(2,1,3,5,9,10,7,4,30,5,5,7),
                        names=('CC','Network','Code','StationId','Latitude','Longitude','Elev','dummy','StationName','From','To','Nrec'))

Problem description

The first read_fwf from a file downloaded from the internet works, the second read_fwf of the same file using an url pointer fails.
I get the following error:

Traceback (most recent call last):
File "", line 2, in
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 803, in read_fwf
return _read(filepath_or_buffer, kwds)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in init
self._make_engine(self.engine)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1132, in _make_engine
self._engine = klass(self.f, **self.options)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 3605, in init
PythonParser.init(self, f, **kwds)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 2238, in init
self.unnamed_cols) = self._infer_columns()
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 2614, in _infer_columns
line = self._buffered_line()
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 2689, in _buffered_line
return self._next_line()
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 2794, in _next_line
orig_line = self._next_iter_line(row_num=self.pos + 1)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 2854, in _next_iter_line
return next(self.data)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 3589, in next
line = next(self.f)
TypeError: 'addinfourl' object is not an iterator

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-754.10.1.el6.centos.plus.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.4.2
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.2.1
pyarrow: None
xarray: 0.12.1
IPython: 7.5.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.3
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO Fixed Widthread_fwfIO NetworkLocal or Cloud (AWS, GCS, etc.) IO IssuesNeeds TestsUnit test(s) needed to prevent regressions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions