Skip to content

Int64Dtype in read_csv leads to unexpected values #26259

Closed
@alohr

Description

@alohr

Code Sample, a copy-pastable example if possible

import pandas as pd
import io

t = io.StringIO('''\
event,timestamp
a,1556559573141592653
b,1556559573141592654
c,
d,1556559573141592655
''')

# Reading the timestamps as strings works fine
print("\nExpected output:")
print(pd.read_csv(t, dtype={'timestamp': object}))

# Now with Int64Dtype
t.seek(0)
print("\nActual output:")
print(pd.read_csv(t, dtype={'timestamp': pd.Int64Dtype()}))

Problem description

I would like to read csv files with nullable (big) integers into a dataframe. The integers represent nanoseconds since the UNIX epoch 1970. Using the Int64Dtype introduced in 0.24.0 seems like the way to go. I quote from the FAQ:

https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions

If you need to represent integers with possibly missing values, use one of the nullable- integer extension dtypes provided by pandas

Expected Output

  event            timestamp
0     a  1556559573141592653
1     b  1556559573141592654
2     c                  NaN
3     d  1556559573141592655

Actual Output

  event            timestamp
0     a  1556559573141592576
1     b  1556559573141592576
2     c                  NaN
3     d  1556559573141592576

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None

pandas: 0.24.2
pytest: None
pip: 19.1
setuptools: 41.0.1
Cython: None
numpy: 1.16.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsExtensionArrayExtending pandas with custom dtypes or arrays.IO CSVread_csv, to_csvNA - MaskedArraysRelated to pd.NA and nullable extension arrays

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions