Closed
Description
In Categorical.set_categories
, we allocate an array of the values, which may be expensive:
pandas/pandas/core/categorical.py
Line 774 in 83436af
It should be possible to do this operation by just manipulating the codes.
In [6]: c = pd.Categorical(['a'] * 100000)
In [7]: c.set_categories(['a', 'b'])
Out[7]:
[a, a, a, a, a, ..., a, a, a, a, a]
Length: 100000
Categories (2, object): [a, b]
See 5ab0123 for how this might work, which will probably be squashed, but it's the implementation of Categorical._set_dtype
in #16015
I may get to this as a followup to that PR.