Description
Recent regression in the categoricals.CategoricalSlicing.time_getitem_list
benchmark: https://pandas.pydata.org/speed/pandas/#categoricals.CategoricalSlicing.time_getitem_list?commits=6efc2379-b9de33e3
Reproducible example for this benchmark:
N = 10 ** 6
categories = ["a", "b", "c"]
values = [0] * N + [1] * N + [2] * N
data = pd.Categorical.from_codes(values, categories=categories)
list_ = list(range(10000))
%timeit data[list_]
Now, this slowdown is due to the changes in #30308. Categorical __getitem__
now checks if the key is a boolean indexer: https://github.com/pandas-dev/pandas/pull/30308/files#diff-f3b2ea15ba728b55cab4a1acd97d996d
So this slowdown is of course expected, and also only for Categorical itself (eg pd.Series indexing already handles this boolean checking). So in that light, we can certainly ignore this regression.
But, this led me think: maybe the ExtensionArrays are a good place to start not supporting object dtype as boolean indexer? (and so not add support for it now, which also avoids this performance regression)