Description
Code Sample
>>> pd.Series(['Pandas', 'is', np.nan], dtype='category').map(lambda x: len(x) if x == x else -1)
0 6.0
1 2.0
2 NaN
dtype: category
Categories (2, int64): [6, 2]
>>> pd.Series(['Pandas', 'is', np.nan], dtype='category').astype(object).map(lambda x: len(x) if x == x else -1)
0 6
1 2
2 -1
dtype: int64
Problem description
Series.map calls its function argument once for each value in the categorical, but never calls it on NaN even if that is part of the series. This is inconsistent with how Series.map usually works, and is very surprising!
I'm raising this issue even though #15706 already exists because that issue is asking for something different (they want the argument to .map to be called once per value in the series, rather than once per unique value).
Another related issue is #20714.
Expected Output
Categorical map should give values equal to those obtained by first converting to object. For any series s
and function f
we should have the invariant that:
s.map(f).astype(object).equals(s.astype(object).map(f).astype(object))
Output of pd.show_versions()
pandas: 0.23.1
pytest: 3.1.2
pip: 18.0
setuptools: 39.0.1
Cython: 0.27.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.9.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None