Skip to content

read_sas with chunksize/iterator raises ValueError #14734

Closed
@pijucha

Description

@pijucha

read_sas doesn't work well with chunksize or iterator parameters.

Code Sample and Problem Description

The following data test file in the repository have 32 lines.

sasfile = 'pandas/io/tests/sas/data/airline.sas7bdat'
pd.read_sas(sasfile).shape
Out[18]: (32, 6)

When we carefully read the file with chunksize/iterator, all's well:

reader = pd.read_sas(sasfile, chunksize=16)
df = reader.read()
df.shape
Out[31]: (16, 6)
df = reader.read()
df.shape
Out[33]: (16, 6)

or

reader = pd.read_sas(sasfile, iterator=True)
df = reader.read(30)
df.shape
Out[37]: (30, 6)
df = reader.read(2)
df.shape
Out[39]: (2, 6)
df = reader.read(2)
type(df)
Out[41]: NoneType

But if we don't know the length of the data, we'll easily stumble on an exception and won't read the whole data, which is painful with large files.

reader = pd.read_sas(sasfile, chunksize=20)
df = reader.read()
df.shape
Out[45]: (20, 6)
df = reader.read()
Traceback (most recent call last):
  File "/usr/local/lib64/python3.5/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-46-c5d811b93ac1>", line 1, in <module>
    df = reader.read()
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/io/sas/sas7bdat.py", line 604, in read
    rslt = self._chunk_to_dataframe()
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/io/sas/sas7bdat.py", line 646, in _chunk_to_dataframe
    dtype=self.byte_order + 'd')
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2419, in __setitem__
    self._set_item(key, value)
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2485, in _set_item
    value = self._sanitize_column(key, value)
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2656, in _sanitize_column
    value = _sanitize_index(value, self.index, copy=False)
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/series.py", line 2793, in _sanitize_index
    raise ValueError('Length of values does not match length of ' 'index')
ValueError: Length of values does not match length of index

or

reader = pd.read_sas(sasfile, iterator=True)
reader.read(30).shape
Out[51]: (30, 6)
reader.read(30).shape
Traceback (most recent call last):
  File "/usr/local/lib64/python3.5/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-52-5d757f713808>", line 1, in <module>
    reader.read(30).shape
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/io/sas/sas7bdat.py", line 604, in read
    rslt = self._chunk_to_dataframe()
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/io/sas/sas7bdat.py", line 646, in _chunk_to_dataframe
    dtype=self.byte_order + 'd')
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2419, in __setitem__
    self._set_item(key, value)
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2485, in _set_item
    value = self._sanitize_column(key, value)
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2656, in _sanitize_column
    value = _sanitize_index(value, self.index, copy=False)
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/series.py", line 2793, in _sanitize_index
    raise ValueError('Length of values does not match length of ' 'index')
ValueError: Length of values does not match length of index

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: 75b606a
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.20-1
machine: x86_64
processor: Intel(R)_Core(TM)i5-2520M_CPU@_2.50GHz
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.0+112.g75b606a
nose: 1.3.7
pip: 9.0.1
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions