Skip to content

API: return "correct" missing value scalar from Categorical? #29962

Open
@jorisvandenbossche

Description

@jorisvandenbossche

From #27929 (comment). In __getitem__ or Categorical.min(..), we always return np.nan as scalar missing value, regardless of the dtype:

In [7]: cat = pd.Categorical([pd.Timestamp("2012"), None], ordered=True)

In [8]: cat  
Out[8]: 
[2012-01-01, NaT]
Categories (1, datetime64[ns]): [2012-01-01]

In [9]: cat[1]   
Out[9]: nan

In [10]: cat.min(skipna=False) 
Out[10]: nan

In the above, this could also be pd.NaT instead?
(similar issue will come up once we can use the EAs that use the new NA scalar in categoricals)

However, CategoricalDtype.na_value now also returns np.nan (which should be consistent with what we return in the cases above):

In [13]: cat.dtype.na_value 
Out[13]: nan

We can of course let the CategoricalDtype.na_value be dependent on the na_value of the dtype of the categories. But I am not fully sure we want such values-dependent behaviour?

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions