Description
Trying to print a data frame as plain, strict tsv (i.e., no quoting and no escaping, because I know none the fields will contain tabs), I wanted to use the "quoting" option, which is documented in pandas and is passed through to csv, as well as the "quotechar" option, not documented in pandas but also a csv option. But it doesn't work:
In [1]: import sys, csv
In [2]: from pandas import DataFrame
In [3]: data = {'col1': ['contents of col1 row1', 'contents " of col1 row2'], 'col2': ['contents of col2 row1', 'contents " of col2 row2'] }
In [4]: df = DataFrame(data)
In [5]: df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None)
col1 col2
0 contents of col1 row1 contents of col2 row1
---------------------------------------------------------------------------
Error Traceback (most recent call last)
<ipython-input-5-a30d32266fb4> in <module>()
----> 1 df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None)
/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/frame.pyc in to_csv(self, path_or_buf, sep, na_rep, float_format, cols, header, index, index_label, mode, nanRep, encoding, quoting, line_terminator, chunksize, tupleize_cols, **kwds)
1409 tupleize_cols=tupleize_cols,
1410 )
-> 1411 formatter.save()
1412
1413 def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='',
/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in save(self)
974
975 else:
--> 976 self._save()
977
978
/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in _save(self)
1080 break
1081
-> 1082 self._save_chunk(start_i, end_i)
1083
1084 def _save_chunk(self, start_i, end_i):
/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in _save_chunk(self, start_i, end_i)
1098 ix = data_index.to_native_types(slicer=slicer, na_rep=self.na_rep, float_format=self.float_format)
1099
-> 1100 lib.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer)
1101
1102 # from collections import namedtuple
/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/lib.so in pandas.lib.write_csv_rows (pandas/lib.c:13871)()
Error: need to escape, but no escapechar set
Adding the parameter
quotechar=kwds.get("quotechar")
to the
formatter = fmt.CSVFormatter(...
call in to_csv(), and doing corresponding changes to format.CSVFormatter()'s init() and save(), produces the expected output:
In [1]: import sys, csv
In [2]: from pandas import DataFrame
In [3]: data = {'col1': ['contents of col1 row1', 'contents " of col1 row2'], 'col2': ['contents of col2 row1', 'contents " of col2 row2'] }
In [4]: df = DataFrame(data)
In [5]: df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None)
col1 col2
0 contents of col1 row1 contents of col2 row1
1 contents " of col1 row2 contents " of col2 row2
i.e., unescaped, unquoted tsv.
More generally, there could be many reasons to want more control of the underlying csv writer, so a generic mechanism (as opposed to adding each param one by one) might be called for (e.g., allowign for a csv dialect object or at least a dictionary holding dialect attributes).