Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
df_b = pd.DataFrame(dict(p=[1,2,3], q=[2,3,4]))
df_a = pd.DataFrame(dict(p=[2,3,4], q=[1,2,3]))
pd.merge(left=df_b, right=df_a, on=['p'], how='outer', suffixes={'_b', '_a'})
Current output:
p | q_a | q_b | |
---|---|---|---|
0 | 1 | 2.0 | NaN |
1 | 2 | 3.0 | 1.0 |
2 | 3 | 4.0 | 2.0 |
3 | 4 | NaN | 3.0 |
Expected Output:
p | q_b | q_a | |
---|---|---|---|
0 | 1 | 2.0 | NaN |
1 | 2 | 3.0 | 1.0 |
2 | 3 | 4.0 | 2.0 |
3 | 4 | NaN | 3.0 |
Problem description
This line is causing the issue:
lsuf, rsuf = self.suffixes
When you unpack set it returns elements in sorted order:
l, r = (2, 1); print(l, r) # l=2, r=1
l, r = [2, 1]; print(l, r) # l=2, r=1
l, r = {2, 1}; print(l, r) # l=1, r=2
Output of pd.show_versions()
pandas : 1.0.3
numpy : 1.18.3
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None]