Closed
Description
A small, complete example of the issue
This is a proposal to allow something like
df.astype({'A': pd.CategoricalDtype(['a', 'b', 'c', 'd'], ordered=True})
Currently, it can be awkward to convert many columns in a DataFrame to a Categorical with control over the categories and orderedness. If you just want to use the defaults, it's not so bad with .astype
:
In [5]: df = pd.DataFrame({"A": list('abc'), 'B': list('def')})
In [6]: df
Out[6]:
A B
0 a d
1 b e
2 c f
In [8]: df.astype({"A": 'category', 'B': 'category'}).dtypes
Out[8]:
A category
B category
dtype: object
If you need to control categories
or ordered
, your best off with
In [20]: mapping = {'A': lambda x: x.A.astype('category').cat.set_categories(['a', 'b'], ordered=True),
...: 'B': lambda x: x.B.astype('category').cat.set_categories(['d', 'f', 'e'], ordered=False)}
In [21]: df.assign(**mapping)
Out[21]:
A B
0 a d
1 b e
2 NaN f
By expanding astype to accept instances of Categorical
, you remove the need for the lambda
s and you can do conversions of other types at the same time.
This would mirror the semantics in #14503
Updated to change pd.Categorical(...)
to a new/modified pd.CategoricalDtype(...)
based on the discussion below.