PERF: regression in CategoricalIndex.get_indexer

The fix in #42089 (or caused by the PR that this one was fixing) seems to have caused a large slowdown on the `get_indexer` benchmarks: https://pandas.pydata.org/speed/pandas/#indexing.CategoricalIndexIndexing.time_get_indexer_list?python=3.8&Cython=0.29.21&p-index='monotonic_incr'&commits=cf5852bf-fce7f9eb

The regression overview (https://pandas.pydata.org/speed/pandas/#regressions?sort=1&dir=desc) lists it as a 1000x slowdown, but that's only because https://github.com/pandas-dev/pandas/pull/42042 first improved the performance a lot (which might be a bit suspicious?). Compared to the timing before that, it's only 4-5x slowdown. With the below code, I see locally a ~9x slowdown on master compared to 1.2.5.


```python
import string, itertools
data_unique = pd.CategoricalIndex(
            ["".join(perm) for perm in itertools.permutations(string.printable, 3)]
)
cat_list = ["a", "c"]

%timeit data_unique.get_indexer(cat_list)
52.8 ms ± 5.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <-- pandas 1.2.5
417 ms ± 22.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <-- master
```

I _think_ it has to do with the fact that before we called the Engine.get_indexer on the codes, while now in the base class version we do that with the `.categories`, which means in this case that both `self` and `target` are cast to object dtype and thus use the Engine.get_indexer for object dtype.

_Originally posted by @jorisvandenbossche in https://github.com/pandas-dev/pandas/issues/42089#issuecomment-868970458_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: regression in CategoricalIndex.get_indexer #42249

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PERF: regression in CategoricalIndex.get_indexer #42249

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions