Skip to content

pd.merge regression when doing a left-join with missing data on the right. Result has a Float64Index #28220

Open
@dworvos

Description

@dworvos

Code Sample, a copy-pastable example if possible

import pandas as pd

X = pd.DataFrame({ "count": [1, 2] }, index=["A", "B"])
Y = pd.DataFrame({"name": ["A", "C"], "value": [100, 200]})
Z = pd.merge(X, Y, left_index=True, right_on="name", how='left')
print(Z.to_string())
# in 0.23.4
#    count name  value
# 0      1    A  100.0
# 1      2    B    NaN

# in 0.24.2
#    count name  value
# 0      1    A  100.0
# 1      2    B    NaN

# in 0.25.1
#      count name  value
# 0.0      1    A  100.0
# NaN      2    B    NaN

assert isinstance(Z.index, pd.Int64Index)

Problem description

I looked on the GitHub tracker for similar issues but the closest I found was #24897. In previous versions of pandas it would return a Int64Index but now returns a Float64Index. I didn't see this behaviour documented in the release notes of 0.25, but please let me know if I've missed it. This bug is easy to reproduce in a virtualenv.

Expected Output

   count name  value
0      1    A  100.0
1      2    B    NaN

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.3.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-957.el7.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.utf-8
LANG : en_US.utf-8
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.17.1
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions