Closed
Description
Code Sample, a copy-pastable example if possible
In [22]: s = """a,b,c,d,e,f,g,h,i,j
...: 2016/09/21,1,1,2,3,4,5,6,7,8"""
In [23]: pd.read_csv(StringIO(s), parse_dates=[0], usecols=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 10 columns):
a 1 non-null datetime64[ns]
b 1 non-null int64
c 1 non-null int64
d 1 non-null int64
e 1 non-null int64
f 1 non-null int64
g 1 non-null int64
h 1 non-null int64
i 1 non-null object <- !!
j 1 non-null int64
dtypes: datetime64[ns](1), int64(8), object(1)
memory usage: 160.0+ bytes
In [24]: pd.read_csv(StringIO(s), parse_dates=[[0, 1]], usecols=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 9 columns):
a_b 1 non-null object
c 1 non-null int64
d 1 non-null int64
e 1 non-null int64
f 1 non-null int64
g 1 non-null int64
h 1 non-null object <- !!
i 1 non-null object <- !!
j 1 non-null int64
dtypes: int64(6), object(3)
memory usage: 152.0+ bytes
Problem description
Since v0.18.1 pd.read_csv()
doesn't parse correctly, and it occurs randomly at every run. It occurs only when usecols
and parse_dates
are both used.
Expected Output
All the columns parsed as int64 and not some randomly as object.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 29.0.1.post20161201
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None