DOC: read_csv() ignores quotes when a regex is used in sep

When using a regular expression in the `sep` argument of `read_csv`, the Python parser disregards quotes in the input file.

In the following example, `example.csv` is parsed correctly by `read_csv` without regexes in `sep`, while the version with regexes (which should evaluate to exactly the same as the previous version) fails because it parses the commas inside the quotes.

`example2.csv`, which doesn't contain quotes, is parsed correctly using the same code.

``` python
In [1]: import pandas

In [2]: !cat example.csv
a,b,c,d
q,w,e,r
a,s,d,f
"z,x,c,v",i,o,p

In [3]: pandas.read_csv('example.csv', engine = 'python', quotechar = '"', sep = ',')
Out[3]: 
         a  b  c  d
0        q  w  e  r
1        a  s  d  f
2  z,x,c,v  i  o  p

In [4]: pandas.read_csv('example.csv', engine = 'python', quotechar = '"', sep = '[,]')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-5f9ad4fdcd46> in <module>()
----> 1 pandas.read_csv('example.csv', engine = 'python', quotechar = '"', sep = '[,]')

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    489                     skip_blank_lines=skip_blank_lines)
    490 
--> 491         return _read(filepath_or_buffer, kwds)
    492 
    493     parser_f.__name__ = name

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    276         return parser
    277 
--> 278     return parser.read()
    279 
    280 _parser_defaults = {

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in read(self, nrows)
    738                 raise ValueError('skip_footer not supported for iteration')
    739 
--> 740         ret = self._engine.read(nrows)
    741 
    742         if self.options.get('as_recarray'):

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in read(self, rows)
   1593             content = content[1:]
   1594 
-> 1595         alldata = self._rows_to_cols(content)
   1596         data = self._exclude_implicit_index(alldata)
   1597 

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _rows_to_cols(self, content)
   1968             msg = ('Expected %d fields in line %d, saw %d' %
   1969                    (col_len, row_num + 1, zip_len))
-> 1970             raise ValueError(msg)
   1971 
   1972         if self.usecols:

ValueError: Expected 4 fields in line 4, saw 7

In [5]: !cat example2.csv
a,b,c,d
q,w,e,r
a,s,d,f
z,x,c,v

In [6]: pandas.read_csv('example2.csv', engine = 'python', quotechar = '"', sep = '[,]')
Out[6]: 
   a  b  c  d
0  q  w  e  r
1  a  s  d  f
2  z  x  c  v

In [7]: pandas.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-74-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0
nose: 1.3.1
pip: 1.5.4
setuptools: 1.1.4
Cython: None
numpy: 1.10.1
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 3.1.0
sphinx: None
patsy: 0.2.1
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: 4.2.1
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: read_csv() ignores quotes when a regex is used in sep #11989

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DOC: read_csv() ignores quotes when a regex is used in sep #11989

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions