Skip to content

BUG: get_indexer methods return int64 instead of intp arrays #36359

Closed
@alexhlim

Description

@alexhlim
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

>>> import pandas as pd
>>> ax1 = pd.Index([1, 2, 3])
>>> ax2 = pd.Index([1, 1, 2])
>>> ans1 = ax1.get_indexer([1])
>>> ans2 = ax2.get_indexer_non_unique([1])
>>> print(ans1, ans1.dtype)
[0] int64
>>> print(ans2[0], ans2[0].dtype, ans2[1], ans2[1].dtype)
[0 1] int64 [] int64

Problem description

Found in #35498. When looking at the implementation of the get_indexer or get_indexer_non_unique in pandas/_libs/index.pyx, I noticed that the returned array dtype will always be int64. Since these methods return indices arrays, I believe that intp is a more appropriate type because it will choose a size depending on ssize_t, which is guaranteed to be large enough to represent all possible indices in the array.

Expected Output

[0] intp
[0 1] intp [] intp

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 7b14cf6b0b9dbcddce7b9bb22a81c73bdebc1be8
python           : 3.7.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.19.76-linuxkit
Version          : #1 SMP Tue May 26 11:42:35 UTC 2020
machine          : x86_64
processor        : 
byteorder        : little
LC_ALL           : C.UTF-8
LANG             : C.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.0rc0+406.g7b14cf6b0
numpy            : 1.18.5
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 45.2.0.post20200210
Cython           : 0.29.21
pytest           : 5.4.3
hypothesis       : 5.20.2
sphinx           : 3.1.1
blosc            : None
feather          : None
xlsxwriter       : 1.2.9
lxml.etree       : 4.4.1
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.16.1
pandas_datareader: None
bs4              : 4.9.1
bottleneck       : 1.3.2
fsspec           : 0.7.4
fastparquet      : 0.4.1
gcsfs            : 0.6.2
matplotlib       : 3.2.1
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.4
pandas_gbq       : None
pyarrow          : 0.16.0
pytables         : None
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.5.1
sqlalchemy       : 1.3.18
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : 0.16.0
xlrd             : 1.2.0
xlwt             : 1.3.0
numba            : 0.50.1

Metadata

Metadata

Assignees

Labels

BugIndexingRelated to indexing on series/frames, not to indexes themselves

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions