Skip to content

read_csv() doesn't parse correctly when usecols and parse_dates are both used #14792

Closed
@rubennj

Description

@rubennj

Code Sample, a copy-pastable example if possible

In [22]: s = """a,b,c,d,e,f,g,h,i,j
    ...: 2016/09/21,1,1,2,3,4,5,6,7,8"""

In [23]: pd.read_csv(StringIO(s), parse_dates=[0], usecols=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 10 columns):
a    1 non-null datetime64[ns]
b    1 non-null int64
c    1 non-null int64
d    1 non-null int64
e    1 non-null int64
f    1 non-null int64
g    1 non-null int64
h    1 non-null int64
i    1 non-null object    <- !!
j    1 non-null int64
dtypes: datetime64[ns](1), int64(8), object(1)
memory usage: 160.0+ bytes

In [24]: pd.read_csv(StringIO(s), parse_dates=[[0, 1]], usecols=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 9 columns):
a_b    1 non-null object
c      1 non-null int64
d      1 non-null int64
e      1 non-null int64
f      1 non-null int64
g      1 non-null int64
h      1 non-null object    <- !!
i      1 non-null object    <- !!
j      1 non-null int64
dtypes: int64(6), object(3)
memory usage: 152.0+ bytes

Problem description

Since v0.18.1 pd.read_csv() doesn't parse correctly, and it occurs randomly at every run. It occurs only when usecols and parse_dates are both used.

Expected Output

All the columns parsed as int64 and not some randomly as object.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 29.0.1.post20161201
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO CSVread_csv, to_csvRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions