Skip to content

BUG: Categorical.from_codes casts nullable int categories to object  #39649

Closed
@arw2019

Description

@arw2019

For int64 the construction yields:

In [18]: import numpy as np
    ...: import pandas as pd
    ...: import pandas._testing as tm
    ...: 
    ...: integer_dtype="int64"
    ...: 
    ...: cats = pd.array(range(5), dtype=integer_dtype)
    ...: codes = np.random.randint(5, size=3)
    ...: categorical_dtype = pd.CategoricalDtype(cats)
    ...: arr = pd.Categorical.from_codes(codes, dtype=categorical_dtype)
    ...: arr
Out[18]: 
[0, 2, 0]
Categories (5, int64): [0, 1, 2, 3, 4]

whereas for nullable int categories:

In [19]: import numpy as np
    ...: import pandas as pd
    ...: import pandas._testing as tm
    ...: 
    ...: integer_dtype="Int64"
    ...: 
    ...: cats = pd.array(range(5), dtype=integer_dtype)
    ...: codes = np.random.randint(5, size=3)
    ...: categorical_dtype = pd.CategoricalDtype(cats)
    ...: arr = pd.Categorical.from_codes(codes, dtype=categorical_dtype)
    ...: arr
Out[19]: 
[3, 3, 4]
Categories (5, object): [0, 1, 2, 3, 4]

I expect instead in the nullable int case:

Out[19]: 
[3, 3, 4]
Categories (5, Int64): [0, 1, 2, 3, 4]

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversionsNA - MaskedArraysRelated to pd.NA and nullable extension arrays

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions