Closed
Description
We don't allow an astype of a DataFrame to category directly
In [44]: df.astype('category')
NotImplementedError: > 1 ndim Categorical are not supported at this time
Instead you can apply the astype per-column.
In [35]: df = DataFrame({'A' : list('aabcda'), 'B' : list('bcdaae')})
In [36]: df
Out[36]:
A B
0 a b
1 a c
2 b d
3 c a
4 d a
5 a e
In [37]: df.apply(lambda x: x.astype('category'))
Out[37]:
A B
0 a b
1 a c
2 b d
3 c a
4 d a
5 a e
In [38]: df.apply(lambda x: x.astype('category')).B
Out[38]:
0 b
1 c
2 d
3 a
4 a
5 e
Name: B, dtype: category
Categories (5, object): [a, b, c, d, e]
In [39]: df.apply(lambda x: x.astype('category')).A
Out[39]:
0 a
1 a
2 b
3 c
4 d
5 a
Name: A, dtype: category
Categories (4, object): [a, b, c, d]
But if you have 'similar' cateogories then you would usually do this, automatically
astyping with the same uniques.
In [41]: uniques = np.sort(pd.unique(df.values.ravel()))
In [42]: df.apply(lambda x: x.astype('category', categories=uniques)).A
Out[42]:
0 a
1 a
2 b
3 c
4 d
5 a
Name: A, dtype: category
Categories (5, object): [a, b, c, d, e]
In [43]: df.apply(lambda x: x.astype('category', categories=uniques)).B
Out[43]:
0 b
1 c
2 d
3 a
4 a
5 e
Name: B, dtype: category
Categories (5, object): [a, b, c, d, e]
This is failry straightforward to actually implement, and I think is a nice easy way of coding, w/o having to actually support 2D categoricals internally (and we are moving away from internal 2-d structures anyhow).