Skip to content

DOC: Key order after dataframe inner merge #53157

Closed
@lucdem

Description

@lucdem

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

Documentation problem

The documentation for the 'how' parameters says:

left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.

From this description you could expect that the result of df1.merge(df2, on='column_name', how='inner') and df1.merge(df2, on='column_name', how='left') would both maintain the same order, however that's not what happens.

Example below, tested on version 2.0.1:

import pandas as pd

df = pd.DataFrame({
	'n': [1, 2, 3, 1, 2, 3],
	'i': [0, 1, 2, 3, 4, 5]
})

df2 = pd.DataFrame({
	'n': [1, 2, 3],
	'str': ['1', '2', '3']
})

print(df.merge(df2, on='n', how='inner'))
print('--------')
print(df.merge(df2, on='n', how='left'))

Output:

   n  i str
0  1  0   1
1  1  3   1
2  2  1   2
3  2  4   2
4  3  2   3
5  3  5   3
--------
   n  i str
0  1  0   1
1  2  1   2
2  3  2   3
3  1  3   1
4  2  4   2
5  3  5   3

Suggested fix for documentation

Either clarify that the merge operation will sort the results based on the left key when using 'inner' for the 'how' parameter, rather than "preserve the order", or explicitly state that the operation does not guarantee any order.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions