Skip to content

Series.apply on categorical with NaN has wrong behavior #24241

Closed
@tchklovski

Description

@tchklovski

Code Sample, a copy-pastable example if possible

# Your code here
>>> print(pd.isna(pd.Series(['A', 'B', pd.np.nan], dtype='category')))
0    False
1    False
2     True
dtype: bool
>>> print(pd.Series(['A', 'B', pd.np.nan], dtype='category').apply(pd.isna))
0    False
1    False
2    False
dtype: object
>>> print(pd.Series(['A', 'A', pd.np.nan], dtype='category').apply(pd.isna))
0    False
1    False
2      NaN
dtype: category
Categories (1, object): [False]

Problem description

I would expect case 2 (['A', 'B', pd.np.nan]) to be either like case 1 or case 3.
I think the correct behavior would be case 1.

Issue #21565 looks similar but not the same

Expected Output

0    False
1    False
2     True
dtype: bool

in all cases.

Behaves as expected if dtype='category' is omitted

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Darwin OS-release: 18.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 4.0.1
pip: 18.1
setuptools: 39.0.1
Cython: 0.29.1
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.7.3
bs4: 4.6.3
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions