Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html
Documentation problem
The documentation for the 'how' parameters says:
left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
From this description you could expect that the result of df1.merge(df2, on='column_name', how='inner')
and df1.merge(df2, on='column_name', how='left')
would both maintain the same order, however that's not what happens.
Example below, tested on version 2.0.1:
import pandas as pd
df = pd.DataFrame({
'n': [1, 2, 3, 1, 2, 3],
'i': [0, 1, 2, 3, 4, 5]
})
df2 = pd.DataFrame({
'n': [1, 2, 3],
'str': ['1', '2', '3']
})
print(df.merge(df2, on='n', how='inner'))
print('--------')
print(df.merge(df2, on='n', how='left'))
Output:
n i str
0 1 0 1
1 1 3 1
2 2 1 2
3 2 4 2
4 3 2 3
5 3 5 3
--------
n i str
0 1 0 1
1 2 1 2
2 3 2 3
3 1 3 1
4 2 4 2
5 3 5 3
Suggested fix for documentation
Either clarify that the merge operation will sort the results based on the left key when using 'inner' for the 'how' parameter, rather than "preserve the order", or explicitly state that the operation does not guarantee any order.