Skip to content

OverflowError: Python int too large to convert to C long #20599

Open
@cscetbon

Description

@cscetbon

Code Sample, a copy-pastable example if possible

import pandas

content = open('failing_pandas.json').readline()
pd = pandas.read_json(content, lines=True)

Problem description

This issue happens on 0.21.1+ and doesn't happen on 0.21.0 for instance. I also tried it using the last master branch 0.23.0 and got the same issue :

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 366, in read_json
    return json_reader.read()
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 464, in read
    self._combine_lines(data.split('\n'))
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 484, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 582, in parse
    self._try_convert_types()
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 838, in _try_convert_types
    lambda col, c: self._try_convert_data(col, c, convert_dates=False))
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 818, in _process_converter
    new_data, result = f(col, c)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 838, in <lambda>
    lambda col, c: self._try_convert_data(col, c, convert_dates=False))
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/io/json/json.py", line 652, in _try_convert_data
    new_data = data.astype('int64')
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/util/_decorators.py", line 118, in wrapper
    return func(*args, **kwargs)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/generic.py", line 4004, in astype
    **kwargs)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 3462, in astype
    return self.apply('astype', dtype=dtype, **kwargs)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 3329, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 544, in astype
    **kwargs)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/internals.py", line 625, in _astype
    values = astype_nansafe(values.ravel(), dtype, copy=True)
  File "/Users/cscetbon/.virtualenvs/pandas1/lib/python2.7/site-packages/pandas/core/dtypes/cast.py", line 692, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  File "pandas/_libs/lib.pyx", line 854, in pandas._libs.lib.astype_intsafe
  File "pandas/_libs/src/util.pxd", line 91, in util.set_value_at_unsafe
OverflowError: Python int too large to convert to C long

Expected Output

It should not crash ...

Output of pd.show_versions()

Here is the one working :

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.21.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

And one failing :

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.21.1
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO JSONread_json, to_json, json_normalize

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions