Open
Description
From #27929 (comment). In __getitem__
or Categorical.min(..)
, we always return np.nan
as scalar missing value, regardless of the dtype:
In [7]: cat = pd.Categorical([pd.Timestamp("2012"), None], ordered=True)
In [8]: cat
Out[8]:
[2012-01-01, NaT]
Categories (1, datetime64[ns]): [2012-01-01]
In [9]: cat[1]
Out[9]: nan
In [10]: cat.min(skipna=False)
Out[10]: nan
In the above, this could also be pd.NaT
instead?
(similar issue will come up once we can use the EAs that use the new NA scalar in categoricals)
However, CategoricalDtype.na_value
now also returns np.nan
(which should be consistent with what we return in the cases above):
In [13]: cat.dtype.na_value
Out[13]: nan
We can of course let the CategoricalDtype.na_value
be dependent on the na_value
of the dtype of the categories. But I am not fully sure we want such values-dependent behaviour?