Open
Description
In the PR implementing .str/.dt
on Categoricals, #11582.
This is perfectly reasonable. We perform the string op on the uniques. This routine is a boolean result, so we return a boolean result.
In [2]: s = pd.Series(list('aabb')).astype('category')
In [3]: s
Out[3]:
0 a
1 a
2 b
3 b
dtype: category
Categories (2, object): [a, b]
In [4]: s.str.contains("a")
Out[4]:
0 True
1 True
2 False
3 False
dtype: bool
However, I don't recall the rationale for: performing the op on the uniques (as its a categorical), but then returning an object
dtype.
In [5]: s.str.upper()
Out[5]:
0 A
1 A
2 B
3 B
dtype: object
These are by-definition pure transforms, and so a new categorical makes sense. e.g. in this case
In [6]: pd.Series(pd.Categorical.from_codes(s.cat.codes, s.cat.categories.str.upper()))
Out[6]:
0 A
1 A
2 B
3 B
dtype: category
Categories (2, object): [A, B]
This will be way more efficient than actually converting to object.