Skip to content

Strip whitespace from column names when usecols in read_csv #14480

Closed
@rahulporuri

Description

@rahulporuri

A small, complete example of the issue

when loading a file of the type, where headers have a trailing whitespace,

a ,b ,c 
1,2,3
4,5,6

I would expect the following code to work and give the result

pandas.read_table('test_data.csv', sep=',', usecols=['a', 'b'])

Expected Output

   a  b
0  1  2
1  4  5

Actual Output

Neither the c nor the python engine produce the expected result.
the tracebacks have been concatenated for brevity.

/Users/rahulporuri/Github/pandas/pandas/io/parsers.pyc in __init__(self, src, **kwds)
   1431 
   1432             if len(self.names) < len(self.usecols):
-> 1433                 raise ValueError("Usecols do not match names.")
   1434 
   1435         self._set_noconvert_columns()

ValueError: Usecols do not match names.
/Users/rahulporuri/Github/pandas/pandas/io/parsers.pyc in _handle_usecols(self, columns, usecols_key)
   2199                 for u in self.usecols:
   2200                     if isinstance(u, string_types):
-> 2201                         col_indices.append(usecols_key.index(u))
   2202                     else:
   2203                         col_indices.append(u)

ValueError: 'a' is not in list

This is related to an issue reported earlier #14460 on stripping columns/column names of whitespaces.

On a side note, if the file has column names with leading whitespaces instead of trailing whitespaces, adding the skipinitialspace=True kwarg to pandas.read_table produces the expected result.

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: 794f792
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.0+27.g794f792
nose: None
pip: 8.1.2
setuptools: 23.1.0
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.1
statsmodels: None
xarray: 0.7.2
IPython: 4.1.2
sphinx: 1.4.1
patsy: None
dateutil: 2.5.2
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO CSVread_csv, to_csv

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions