Skip to content

BUG: .values for objects containing categoricals with box-able categories #21658

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

A bit a cryptic title, but following from my investigation of #21390 (comment).

When having a categorical values with "box-able" categories, eg:

In [1]: cat = pd.Categorical(pd.date_range("2012-01-01", periods=3, freq='H'))

In [3]: cat.tolist()
Out[3]: 
[Timestamp('2012-01-01 00:00:00', freq='H'),
 Timestamp('2012-01-01 01:00:00', freq='H'),
 Timestamp('2012-01-01 02:00:00', freq='H')]

the boxing to Timestamps works for the Categorical itself, but once this is combined in 2D data structures, the boxing fails:

In [7]: midx = pd.MultiIndex.from_product([['a', 'b', 'c'], cat])

In [8]: midx.values
Out[8]: 
array([('a', 1325376000000000000), ('a', 1325379600000000000),
       ('a', 1325383200000000000), ('b', 1325376000000000000),
       ('b', 1325379600000000000), ('b', 1325383200000000000),
       ('c', 1325376000000000000), ('c', 1325379600000000000),
       ('c', 1325383200000000000)], dtype=object)

In [9]: df = pd.DataFrame({'a':['a', 'b', 'c'], 'b': cat, 'c': np.array(cat)})

In [10]: df.dtypes
Out[10]: 
a            object
b          category
c    datetime64[ns]
dtype: object

In [11]: df.values
Out[11]: 
array([['a', 1325376000000000000, Timestamp('2012-01-01 00:00:00')],
       ['b', 1325379600000000000, Timestamp('2012-01-01 01:00:00')],
       ['c', 1325383200000000000, Timestamp('2012-01-01 02:00:00')]], dtype=object)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions