Skip to content

read_excel() modifies provided types dict when accessing file with duplicate column #42462

Closed
@cdol

Description

@cdol
  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

test.xlsx :

a a b c
1 1 b1 c1
2 2 b2 c2
3 3 b3 c3
import pandas as pd


types_dict = {'a': str,
             'b': str,
             'c': str,
             }


if __name__ == "__main__":
    df = pd.read_excel('./test.xlsx', dtype=type_dict)
    print(list(type_dict.keys()))
>> ['a', 'b', 'c', 'a.1']

Bug/Issue description:
When using dtype loading a .xlsx-file with a duplicate column into a dataframe modifies the provided types_dict / adds entries for duplicate columns.

It seems to me like the modification of the types_dict is an unwanted side effect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO Excelread_excel, to_excelRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions