Skip to content

read_csv in combination with index_col and usecols #2654

Closed
@floux

Description

@floux

Starting point:

http://pandas.pydata.org/pandas-docs/stable/io.html#index-columns-and-trailing-delimiters

If there is one more column of data than there are colum names, usecols exhibits some (at least for me) unintuitive behavior:

>>> data = 'a,b,c\n4,apple,bat,5.7\n8,orange,cow,10'
>>> pd.read_csv(StringIO(data))
        a    b     c
4   apple  bat   5.7
8  orange  cow  10.0
>>> pd.read_csv(StringIO(data), usecols=['a', 'b'])
   a       b
0  4   apple
1  8  orange
>>>

I was expecting it to be equal to

>>> pd.read_csv(StringIO(data))[['a', 'b']]
        a    b
4   apple  bat
8  orange  cow

I am not sure if my expectation is unfounded, though, and that this behavior is indeed intentional?

Metadata

Metadata

Assignees

Labels

BugIO DataIO issues that don't fit into a more specific label

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions