Description
This is a minor issue about error reporting to the mindless user (me...) who confuses the header
and the name
argument of read_csv
. Basically, when calling read_csv
with header=['a', 'b']
(whereas it should be names=['a', 'b']
), the error message is crytic:
TypeError: must be str, not int
(pandas 0.20.1, see details below)
Two issues:
- unhelpful, quite cryptic message, doesn't point in the good direction. E.g. it doesn't explain which argument causes the problem. Of course in the dummy example below, there is just one argument, but in the real case where I got bitten it was messier...
- it is impossible to debug with %debug magic, because error is raised in the compiled code
parsers.pyx
Here is code to reproduce the error message, taken from a IPython session. (First line may be a bit Unix specific, sorry. It's just to create a dummy CSV file)
In [] !echo '1,2\n3,4' > 1234.csv
In [] pd.read_csv('1234.csv')
1 2
0 3 4
In [] pd.read_csv('1234.csv', names=['a', 'b']) # proper call
a b
0 1 2
1 3 4
In [] pd.read_csv('1234.csv', header=['a', 'b']) # beginer's mistake
TypeError Traceback (most recent call last)
<ipython-input-5-b065bd1f57c6> in <module>()
----> 1 pd.read_csv('1234.csv', header=['a', 'b'])
/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
653 skip_blank_lines=skip_blank_lines)
654
--> 655 return _read(filepath_or_buffer, kwds)
656
657 parser_f.__name__ = name
/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
403
404 # Create the parser.
--> 405 parser = TextFileReader(filepath_or_buffer, **kwds)
406
407 if chunksize or iterator:
/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
760 self.options['has_index_names'] = kwds['has_index_names']
761
--> 762 self._make_engine(self.engine)
763
764 def close(self):
/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
964 def _make_engine(self, engine='c'):
965 if engine == 'c':
--> 966 self._engine = CParserWrapper(self.f, **self.options)
967 else:
968 if engine == 'python':
/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1580 kwds['allow_leading_cols'] = self.index_col is not False
1581
-> 1582 self._reader = parsers.TextReader(src, **kwds)
1583
1584 # XXX
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__ (pandas/_libs/parsers.c:5996)()
TypeError: must be str, not int
Expected Output
I'm not expecting a fancy AI-assistant like error message. However, an early check of the header argument should verify, in coherence with the docstring, that header
should be int or list of ints
.
What do you think? Is it an overkill?
Output of pd.show_versions()
pandas: 0.20.1
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None