PERF: Categorical indexing performance regression

Recent regression in the `categoricals.CategoricalSlicing.time_getitem_list` benchmark: https://pandas.pydata.org/speed/pandas/#categoricals.CategoricalSlicing.time_getitem_list?commits=6efc2379-b9de33e3

Reproducible example for this benchmark:

```
N = 10 ** 6
categories = ["a", "b", "c"]
values = [0] * N + [1] * N + [2] * N
data = pd.Categorical.from_codes(values, categories=categories)

list_ = list(range(10000))

%timeit data[list_]
```

Now, this slowdown is due to the changes in https://github.com/pandas-dev/pandas/pull/30308. Categorical `__getitem__` now checks if the key is a boolean indexer: https://github.com/pandas-dev/pandas/pull/30308/files#diff-f3b2ea15ba728b55cab4a1acd97d996d

So this slowdown is of course expected, and also only for Categorical itself (eg pd.Series indexing already handles this boolean checking). So in that light, we can certainly ignore this regression. 
But, this led me think: maybe the ExtensionArrays are a good place to start not supporting object dtype as boolean indexer? (and so not add support for it now, which also avoids this performance regression)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: Categorical indexing performance regression #30744

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PERF: Categorical indexing performance regression #30744

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions