Skip to content

Issue with CSV parser when using usecols #3192

Closed
@tr11

Description

@tr11

The following code:

import pandas
from io import StringIO

data = u'1,2,3\n4,5,6\n7,8,9'
df = pandas.read_csv(StringIO(data), 
                     dtype={'B': int, 'C':float},
                     header=None,
                     names=['A', 'B', 'C'],
                     converters={'A': str},
)
print(df.dtypes)
print
df = pandas.read_csv(StringIO(data), usecols=[0,2],
                     dtype={'B': int, 'C':float},
                     header=None,
                     names=['A', 'B', 'C'],
                     converters={'A': str},
)
print(df.dtypes)
print
df = pandas.read_csv(StringIO(data), usecols=[0,1,2],
                     dtype={'B': int, 'C':float},
                     header=None,
                     names=['A', 'B', 'C'],
                     converters={'A': str},
)
print(df.dtypes)

outputs

A     object
B      int32
C    float64
dtype: object

A     object
C    float64
dtype: object

A    object
B    object
C    object
dtype: object

The last dtype should be the same as the first one. The issue arises when passing in a converters dictionary together with a usecols that uses all the available columns.

The commit https://github.com/tr11/pandas/commit/1ac3700e72a5861a7d8544a72d77a4d64c71f118 in my fork seems to fix this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsIO CSVread_csv, to_csvIO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions