Description
A small, complete example of the issue
when loading a file of the type, where headers have a trailing whitespace,
a ,b ,c
1,2,3
4,5,6
I would expect the following code to work and give the result
pandas.read_table('test_data.csv', sep=',', usecols=['a', 'b'])
Expected Output
a b
0 1 2
1 4 5
Actual Output
Neither the c
nor the python
engine produce the expected result.
the tracebacks have been concatenated for brevity.
/Users/rahulporuri/Github/pandas/pandas/io/parsers.pyc in __init__(self, src, **kwds)
1431
1432 if len(self.names) < len(self.usecols):
-> 1433 raise ValueError("Usecols do not match names.")
1434
1435 self._set_noconvert_columns()
ValueError: Usecols do not match names.
/Users/rahulporuri/Github/pandas/pandas/io/parsers.pyc in _handle_usecols(self, columns, usecols_key)
2199 for u in self.usecols:
2200 if isinstance(u, string_types):
-> 2201 col_indices.append(usecols_key.index(u))
2202 else:
2203 col_indices.append(u)
ValueError: 'a' is not in list
This is related to an issue reported earlier #14460 on stripping columns/column names of whitespaces.
On a side note, if the file has column names with leading whitespaces instead of trailing whitespaces, adding the skipinitialspace=True
kwarg to pandas.read_table
produces the expected result.
Output of pd.show_versions()
commit: 794f792
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 16.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.0+27.g794f792
nose: None
pip: 8.1.2
setuptools: 23.1.0
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.1
statsmodels: None
xarray: 0.7.2
IPython: 4.1.2
sphinx: 1.4.1
patsy: None
dateutil: 2.5.2
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None