Description
I do not understand the sort order for Python Pandas DataFrame merge
function with how="inner"
. Example:
import pandas as pd
df2 = pd.DataFrame({'a': (6, 7, 8, 6), 'b': ("w", "x", "y", "z")})
print(df2)
print("left:")
dfMerge2 = pd.merge(df2, df2, on='a', how="left")
print(dfMerge2)
dfMerge = pd.merge(df2, df2, on='a', how="inner")
print("inner:")
print(dfMerge)
Result:
a b
0 6 w
1 7 x
2 8 y
3 6 z
left:
a b_x b_y
0 6 w w
1 6 w z
2 7 x x
3 8 y y
4 6 z w
5 6 z z
inner:
a b_x b_y
0 6 w w
1 6 w z
2 6 z w
3 6 z z
4 7 x x
5 8 y y
I would expect that for how="inner"
the order of the resulting rows with
6 z w
and
6 z z
would be the same as with how="left"
, as the documentation https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html says:
- left: use only keys from left frame, similar to a SQL left outer join; preserve key order
- inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-103-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.4.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: None
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.1.13
pymysql: 0.7.9.None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None