ENH: allow usecols to be case insensitive

I have a wide delimited file that includes columns with predictable names but unpredictable capitalisation, as shown in this toy code.

``` python
import pandas as pd
from io import StringIO

data = """1,1,1,1
2,2,2,2
3,3,3,3
4,4,4,4
5,5,5,5"""
names = ['col1', 'col2', 'COL3', 'Col4']
df = pd.read_csv(StringIO(data), names=names, usecols=None)
df.head()
```

|  | col1 | col2 | COL3 | Col4 |
| --- | --- | --- | --- | --- |
| **0** | 1 | 1 | 1 | 1 |
| **1** | 2 | 2 | 2 | 2 |
| **2** | 3 | 3 | 3 | 3 |
| **3** | 4 | 4 | 4 | 4 |
| **4** | 5 | 5 | 5 | 5 |

I want to pass a list of column names to `usecols` when importing the file, but I won't be able to unless I match the names exactly:

``` python
df = pd.read_csv(StringIO(data), names=names, usecols=['col3', 'col4'])
```

```
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-92d038c5ac8c> in <module>()
      1 cols = ['col3', 'col4']
----> 2 df1 = pd.read_csv('test.csv', usecols=['col3', 'col4'])
      3 df1.head()

/Users/alexchase/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    560                     skip_blank_lines=skip_blank_lines)
    561 
--> 562         return _read(filepath_or_buffer, kwds)
    563 
    564     parser_f.__name__ = name

/Users/alexchase/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    313 
    314     # Create the parser.
--> 315     parser = TextFileReader(filepath_or_buffer, **kwds)
    316 
    317     if (nrows is not None) and (chunksize is not None):

/Users/alexchase/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    643             self.options['has_index_names'] = kwds['has_index_names']
    644 
--> 645         self._make_engine(self.engine)
    646 
    647     def close(self):

/Users/alexchase/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    797     def _make_engine(self, engine='c'):
    798         if engine == 'c':
--> 799             self._engine = CParserWrapper(self.f, **self.options)
    800         else:
    801             if engine == 'python':

/Users/alexchase/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1255 
   1256             if len(self.names) < len(self.usecols):
-> 1257                 raise ValueError("Usecols do not match names.")
   1258 
   1259         self._set_noconvert_columns()

ValueError: Usecols do not match names.
```

I see two problems here. First, the error text would be more helpful if it specified what column name(s) is/are wrong. That's pretty easy to fix. Second, if I know that capitalisation might be a problem, I have to check every column name individually before I can pass a list to `usecols`. One possible solution would be to modify [this part](https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L1421) and [this part](https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L2193) of `parsers.py` to case-transform the lists of column names before matching. This behaviour could be made the default, or it could be optional via an `ignore_col_case` keyword in the various `read` functions.

---

> ## INSTALLED VERSIONS
> 
> commit: None
> python: 3.5.2.final.0
> python-bits: 64
> OS: Darwin
> OS-release: 15.6.0
> machine: x86_64
> processor: i386
> byteorder: little
> LC_ALL: None
> LANG: en_GB.UTF-8
> 
> pandas: 0.18.1
> nose: 1.3.7
> pip: 8.1.2
> setuptools: 25.1.6
> Cython: 0.24.1
> numpy: 1.11.1
> scipy: 0.18.0
> statsmodels: 0.6.1
> xarray: None
> IPython: 5.1.0
> sphinx: 1.4.1
> patsy: 0.4.1
> dateutil: 2.5.3
> pytz: 2016.6.1
> blosc: None
> bottleneck: 1.1.0
> tables: 3.2.3.1
> numexpr: 2.6.1
> matplotlib: 1.5.1
> openpyxl: 2.3.2
> xlrd: 1.0.0
> xlwt: 1.1.2
> xlsxwriter: 0.9.2
> lxml: 3.6.4
> bs4: 4.4.1
> html5lib: None
> httplib2: None
> apiclient: None
> sqlalchemy: 1.0.13
> pymysql: 0.7.6.None
> psycopg2: None
> jinja2: 2.8
> boto: 2.40.0
> pandas_datareader: None


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: allow usecols to be case insensitive #14154

INSTALLED VERSIONS

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: allow usecols to be case insensitive #14154

Description

INSTALLED VERSIONS

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions