ENH: support .astype('category') on DataFrame / aka co-factorization

xref to #10696, #8709

We don't allow an astype of a DataFrame to category directly

```
In [44]: df.astype('category')
NotImplementedError: > 1 ndim Categorical are not supported at this time
```

Instead you can apply the astype per-column.

```
In [35]: df = DataFrame({'A' : list('aabcda'), 'B' : list('bcdaae')})

In [36]: df
Out[36]: 
   A  B
0  a  b
1  a  c
2  b  d
3  c  a
4  d  a
5  a  e

In [37]: df.apply(lambda x: x.astype('category'))
Out[37]: 
   A  B
0  a  b
1  a  c
2  b  d
3  c  a
4  d  a
5  a  e

In [38]: df.apply(lambda x: x.astype('category')).B
Out[38]: 
0    b
1    c
2    d
3    a
4    a
5    e
Name: B, dtype: category
Categories (5, object): [a, b, c, d, e]

In [39]: df.apply(lambda x: x.astype('category')).A
Out[39]: 
0    a
1    a
2    b
3    c
4    d
5    a
Name: A, dtype: category
Categories (4, object): [a, b, c, d]
```

But if you have 'similar' cateogories then you would usually do this, automatically 
astyping with the same uniques.

```
In [41]: uniques = np.sort(pd.unique(df.values.ravel()))

In [42]: df.apply(lambda x: x.astype('category', categories=uniques)).A
Out[42]: 
0    a
1    a
2    b
3    c
4    d
5    a
Name: A, dtype: category
Categories (5, object): [a, b, c, d, e]

In [43]: df.apply(lambda x: x.astype('category', categories=uniques)).B
Out[43]: 
0    b
1    c
2    d
3    a
4    a
5    e
Name: B, dtype: category
Categories (5, object): [a, b, c, d, e]
```

This is failry straightforward to actually implement, and I think is a nice easy way of coding, w/o having to actually support 2D categoricals internally (and we are moving away from internal 2-d structures anyhow).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: support .astype('category') on DataFrame / aka co-factorization #12860

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: support .astype('category') on DataFrame / aka co-factorization #12860

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions