API: .str ops on category should return category if result is non-boolean

In the PR implementing ``.str/.dt`` on Categoricals, https://github.com/pandas-dev/pandas/pull/11582.

This is perfectly reasonable. We perform the string op on the uniques. This routine is a boolean result, so we return a boolean result.
```
In [2]: s = pd.Series(list('aabb')).astype('category')

In [3]: s
Out[3]: 
0    a
1    a
2    b
3    b
dtype: category
Categories (2, object): [a, b]

In [4]:  s.str.contains("a")
Out[4]: 
0     True
1     True
2    False
3    False
dtype: bool
```

However, I don't recall the rationale for: performing the op on the uniques (as its a categorical), but then returning an ``object`` dtype.
```
In [5]: s.str.upper()
Out[5]: 
0    A
1    A
2    B
3    B
dtype: object
```

These are by-definition pure transforms, and so a new categorical makes sense. e.g. in this case

```
In [6]: pd.Series(pd.Categorical.from_codes(s.cat.codes, s.cat.categories.str.upper()))
Out[6]: 
0    A
1    A
2    B
3    B
dtype: category
Categories (2, object): [A, B]
```

This will be way more efficient than actually converting to object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: .str ops on category should return category if result is non-boolean #15198

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API: .str ops on category should return category if result is non-boolean #15198

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions