read_csv(...,mangle_dupe_cols=True) causes silent data loss for certain column names. Request introduction of mangle_dupe_cols_str

Lets say we have data with both column and row headers. Lets say this dataset also simply outputs a ```0``` in that csv file for the cell at the intersection of the column and row headers (A1 in excel notation, or cell (0,0) in the csv file). Additionally, both ```"0"``` and ```"0.1"``` are valid column names:

RH\\CH|-0.1|0|0.1
---|---|---|---
10.0|123|456|789
20.0|012|345|678

Thus, "RH\\CH" is replaced by ```"0"``` on export (of which I have no control). 

```python
import pandas as pd
from cStringIO import StringIO

data = '0,-0.1,0,0.1\n10.0,123,456,789\n20.0,012,345,678'
df = pd.read_csv(StringIO(data),index_col=0,header=0)
print(df)

      -0.1  0.1  0.1
0                   
10.0   123  789  789
20.0    12  678  678

```

The name mangling will change the duplicate ```"0"``` column containing data to ```"0.1"```, thus if a real data column has the name ```"0.1"```, this data will be copied (?) back to the mangled duplicate ```"0"``` column. Additionally, if the true ```"0.1"``` data name is missing, then this name mangling is frustrating as there is no obvious way to determine if the now present ```"0.1"``` column is a duplicate ```"0"``` that has been mangled or is a real ```"0.1"``` data series.

I propose the addition of a ```mangle_dupe_cols_str``` keyword option that defaults to ```'.'``` to preserve the current behavior. However, it can be passed as a kwarg to ```read_csv``` in cases where the period name mangling could result in further duplicate columns.

In https://github.com/pandas-dev/pandas/blob/v0.19.1/pandas/io/parsers.py#L2109-L2111:
```python
                        if cur_count > 0:
                            this_columns[i] = '%s.%d' % (col, cur_count)
                        counts[col] = cur_count + 1
```
should be adapted to 
```python
                        if cur_count > 0:
                            this_columns[i] = '%s%s%d' % (col, self.mangle_dupe_cols_str, cur_count)
                        counts[col] = cur_count + 1
```
and lines 1042-1043 in https://github.com/pandas-dev/pandas/blob/v0.19.1/pandas/io/parsers.py#L1042-L1043 changed to:
```python
        self.mangle_dupe_cols = kwds.get('mangle_dupe_cols', True)
        self.mangle_dupe_cols_str = kwds.get('mangle_dupe_cols_str', '.')        
        self.infer_datetime_format = kwds.pop('infer_datetime_format', False)
```

#### Expected Output
Then the corrected output would be:
```python
>>> data = '0,-0.1,0,0.1\n10.0,123,456,789\n20.0,012,345,678'
>>> df = pd.read_csv(StringIO(data),index_col=0,header=0,mangle_dupe_cols_str='__')
>>> print(df)

      -0.1  0__1  0.1
0                   
10.0   123  456  789
20.0    12  345  678

```
where subsequent operations could identify any columns with the mangle_dupe_cols_str, if needed. The current behavior silently causes data loss for certain column names.

<details>
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: None
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.3
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

read_csv(...,mangle_dupe_cols=True) causes silent data loss for certain column names. Request introduction of mangle_dupe_cols_str #14704

Expected Output

INSTALLED VERSIONS

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

read_csv(...,mangle_dupe_cols=True) causes silent data loss for certain column names. Request introduction of mangle_dupe_cols_str #14704

Description

Expected Output

INSTALLED VERSIONS

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions