Issue with Excel writers when column names are duplicated

There appears to be an issue with Excel writers when DataFrame column names are duplicated. This issue that was initially reported on [StackOverflow](http://stackoverflow.com/questions/19363779/problems-writing-to-file-in-pandas).

For example consider the following program: 

``` python
import pandas as pd
from pandas import DataFrame

df = DataFrame([[1, 2, 3], [1, 2, 3], [1, 2, 3]])

df.columns = ['A', 'B', 'B']  # !!!

df.to_csv('output.csv')
df.to_excel('output.xlsx')

```

Note the duplicated column name. The `df` for this looks like this:

``` python
>>> df
   A  B  B
0  1  2  3
1  1  2  3
2  1  2  3
```

The corresponding output of the CSV is as expected:

```
$ cat output.csv
,A,B,B
0,1,2,3
1,1,2,3
2,1,2,3
```

However, the output of the any of the Excel writers is incorrect:

![screenshot](https://f.cloud.github.com/assets/94267/1338347/b42803a2-35e5-11e3-912e-4923d3382f3d.png)

The issue appears to be in `pandas/core/format.py`. The output data is gathered based on column names, as shown below, which causes issues with duplicate names.

``` python
    def _format_regular_rows(self):
        ...
        for colidx, colname in enumerate(self.columns):
            series = self.df[colname]
            ... 
```

I initially thought that this might be the correct behaviour and that column names shouldn't be duplicated but given that the output is different to the csv writer it looks like a bug.

I'll write a test case but I'm not sure of the best way to fix the issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Issue with Excel writers when column names are duplicated #5235

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Issue with Excel writers when column names are duplicated #5235

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions