Closed
Description
The following code:
import pandas
from io import StringIO
data = u'1,2,3\n4,5,6\n7,8,9'
df = pandas.read_csv(StringIO(data),
dtype={'B': int, 'C':float},
header=None,
names=['A', 'B', 'C'],
converters={'A': str},
)
print(df.dtypes)
print
df = pandas.read_csv(StringIO(data), usecols=[0,2],
dtype={'B': int, 'C':float},
header=None,
names=['A', 'B', 'C'],
converters={'A': str},
)
print(df.dtypes)
print
df = pandas.read_csv(StringIO(data), usecols=[0,1,2],
dtype={'B': int, 'C':float},
header=None,
names=['A', 'B', 'C'],
converters={'A': str},
)
print(df.dtypes)
outputs
A object
B int32
C float64
dtype: object
A object
C float64
dtype: object
A object
B object
C object
dtype: object
The last dtype should be the same as the first one. The issue arises when passing in a converters
dictionary together with a usecols
that uses all the available columns.
The commit https://github.com/tr11/pandas/commit/1ac3700e72a5861a7d8544a72d77a4d64c71f118 in my fork seems to fix this issue.