Closed
Description
Code Sample, a copy-pastable example if possible
import pandas as pd
import sys
print("pandas version is", pd.__version__)
print("Python version is", sys.version)
x = pd.DataFrame({'a':[1, 2, 1, 2, 1, 2, 1, 2], 'b':range(8)})
y = pd.DataFrame({'a':[1, 2]})
print("original x")
print(x)
print("\n\nmerged dataframe after inner merge")
print(pd.merge(x, y, how='inner', on=['a']))
print("\n\n***merged dataframe after left merge")
print(pd.merge(x, y, how='left', on=['a']))
Problem description
Based on the pandas documentation, I was expecting the order of the keys in the left dataframe (x) to be preserved in both cases. The documentation says:
left: use only keys from left frame, similar to a SQL left outer join; preserve key order
inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys
Instead, the output is:
pandas version is 0.23.4
Python version is 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609]
original x
a b
0 1 0
1 2 1
2 1 2
3 2 3
4 1 4
5 2 5
6 1 6
7 2 7
merged dataframe after inner merge
a b
0 1 0
1 1 2
2 1 4
3 1 6
4 2 1
5 2 3
6 2 5
7 2 7
***merged dataframe after left merge
a b
0 1 0
1 2 1
2 1 2
3 2 3
4 1 4
5 2 5
6 1 6
7 2 7
Is this intended behavior? If so, the documentation seems a bit confusing?