Skip to content

API: Expand DataFrame.astype to allow Categorical(categories, ordered) #14676

Closed
@TomAugspurger

Description

@TomAugspurger

A small, complete example of the issue

This is a proposal to allow something like

df.astype({'A': pd.CategoricalDtype(['a', 'b', 'c', 'd'], ordered=True})

Currently, it can be awkward to convert many columns in a DataFrame to a Categorical with control over the categories and orderedness. If you just want to use the defaults, it's not so bad with .astype:

In [5]: df = pd.DataFrame({"A": list('abc'), 'B': list('def')})

In [6]: df
Out[6]:
   A  B
0  a  d
1  b  e
2  c  f

In [8]: df.astype({"A": 'category', 'B': 'category'}).dtypes
Out[8]:
A    category
B    category
dtype: object

If you need to control categories or ordered, your best off with

In [20]: mapping = {'A': lambda x: x.A.astype('category').cat.set_categories(['a', 'b'], ordered=True),
    ...:            'B': lambda x: x.B.astype('category').cat.set_categories(['d', 'f', 'e'], ordered=False)}

In [21]: df.assign(**mapping)
Out[21]:
     A  B
0    a  d
1    b  e
2  NaN  f

By expanding astype to accept instances of Categorical, you remove the need for the lambdas and you can do conversions of other types at the same time.

This would mirror the semantics in #14503

Updated to change pd.Categorical(...) to a new/modified pd.CategoricalDtype(...) based on the discussion below.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions