Skip to content

API: different default for header kwarg in read_csv vs read_excel #11889

Open
@jorisvandenbossche

Description

@jorisvandenbossche

From #11874 (comment)

The default for hader in read_csv is 'infer', while for read_excel this is 0. In most cases this does not matter I think (as the infer will use 0 as the default). But, in the case of specifying the names explicitly, this makes a difference:

For read_csv, you need to explicitly pass header=0 to replace header existing in file:

In [35]: s = """a,b,c
   ....: 1,2,3
   ....: 4,5,6"""

In [48]: pd.read_csv(StringIO(s), names=list('ABC'))
Out[48]:
   A  B  C
0  a  b  c
1  1  2  3
2  4  5  6

In [49]: pd.read_csv(StringIO(s), header=None, names=list('ABC'))
Out[49]:
   A  B  C
0  a  b  c
1  1  2  3
2  4  5  6

In [51]: pd.read_csv(StringIO(s), header=0, names=list('ABC'))
Out[51]:
   A  B  C
0  1  2  3
1  4  5  6

while for read_excel, you need to explicitly pass header=None if the file contains no header row:

In [31]: pd.read_excel('test_names.xlsx')
Out[31]:
   a  b  c
0  1  2  3
1  4  5  6

In [46]: pd.read_excel('test_names.xlsx', names=['A', 'B', 'C'])     <--- ignoring the data of the first row
Out[46]:
   A  B  C
0  1  2  3
1  4  5  6

In [45]: pd.read_excel('test_names.xlsx', header=False, names=['A', 'B', 'C'])
Out[45]:
   A  B  C
0  1  2  3
1  4  5  6

In [47]: pd.read_excel('test_names.xlsx', header=None, names=['A', 'B', 'C'])
Out[47]:
   A  B  C
0  a  b  c
1  1  2  3
2  4  5  6

Would it make sense to change the read_excel behaviour to make it more consistent to the read_csv one?

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorEnhancementIO Excelread_excel, to_excelNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions