Skip to content

BUG: get_indexer_non_unique() does not handle targets of dtype='object' with NaNs correctly #44482

Closed
@johannes-mueller

Description

@johannes-mueller

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

idx = pd.Index([1.0, 2.0])
target = pd.Index([1, np.nan], dtype='object')

print(idx.get_indexer(target))
print(idx.get_indexer_non_unique(target))

Issue Description

The script outputs

[ 0 -1]
(array([0, 0, 1]), array([], dtype=int64))

It boils down to the fact that in object arrays with NaNs in them the are not sorted in as expected as discussed in numpy/numpy#15499.

This is one aspect of #44465 which needs to be split due to two different root causes.

Expected Behavior

Expected output

[ 0 -1]
(array([ 0, -1]), array([1]))

Installed Versions

INSTALLED VERSIONS ------------------ commit : 700be61 python : 3.8.12.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-90-lowlatency Version : #101-Ubuntu SMP PREEMPT Fri Oct 15 20:57:56 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8

pandas : 1.4.0.dev0+1132.g700be617eb
numpy : 1.21.4
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 58.5.3
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.24.2
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.6.4
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.23.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.11.0
fastparquet : 0.7.1
gcsfs : 2021.11.0
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.0
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.7.2
sqlalchemy : 1.4.26
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexRelated to the Index class or subclasses

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions