Skip to content

read_csv python engine errors #10476

Closed
Closed
@michaelaye

Description

@michaelaye

Only thing I changed from my usually working reduction pipeline is to try engine="python" (because I wanted to use nrows for a smaller test-read, but that fails as well, and I thought maybe the python engine is buggy currently):

$ python reduction.py ~/data/planet4/2015-06-21_planet_four_classifications.csv
INFO:Starting reduction.
Traceback (most recent call last):
  File "reduction.py", line 258, in <module>
    args.test_n_rows, args.remove_duplicates)
  File "reduction.py", line 182, in main
    data = [chunk for chunk in reader]
  File "reduction.py", line 182, in <listcomp>
    data = [chunk for chunk in reader]
  File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 697, in __iter__
    yield self.read(self.chunksize)
  File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 721, in read
    ret = self._engine.read(nrows)
  File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 1556, in read
    content = self._get_lines(rows)
  File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 2007, in _get_lines
    for _ in range(rows):
TypeError: 'float' object cannot be interpreted as an integer

My function call is this:

# as chunksize and nrows cannot be used together yet, i switch chunksize
# to None if I want test_n_rows for a small test database:
if test_n_rows:
    chunks = None
else:
    chunks = 1e6
# creating reader object with pandas interface for csv parsing
# doing this in chunks as its faster. Also, later will do a split
# into multiple processes to do this.
reader = pd.read_csv(fname, chunksize=chunks, na_values=['null'],
                                   usecols=analysis_cols, nrows=test_n_rows,
                                   engine='c')

Using pandas-0.16.2_58_g01995b2-py3.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions