Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
import numpy as np
from pandas.api.types import CategoricalDtype
# with categorical column and use_inf_as_na = True -> ERROR
pd.options.mode.use_inf_as_na = True
df1 = pd.DataFrame([['a1', 'good'], ['b1', 'good'], ['c1', 'good'], ['d1', 'bad']], columns=['C1', 'C2'])
df2 = pd.DataFrame([['a1', 'good'], ['b1', np.inf], ['c1', np.NaN], ['d1', 'bad']], columns=['C1', 'C2'])
categories = CategoricalDtype(categories=['good', 'bad'], ordered=True)
df1.loc[:, 'C2'] = df1['C2'].astype(categories)
df2.loc[:, 'C2'] = df2['C2'].astype(categories)
df1.dropna(axis=0) # ERROR
df2.dropna(axis=0) # ERROR
Problem description
With the latest version of pandas (1.0.3, installed via pip on python 3.6.8), DataFrame.dropna returns an error if a column is of type CategoricalDtype AND pd.options.mode.use_inf_as_na = True.
Exception with traceback:
Traceback (most recent call last):
File "", line 1, in
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/frame.py", line 4751, in dropna
count = agg_obj.count(axis=agg_axis)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/frame.py", line 7800, in count
result = notna(frame).sum(axis=axis)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/dtypes/missing.py", line 376, in notna
res = isna(obj)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/dtypes/missing.py", line 126, in isna
return _isna(obj)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/dtypes/missing.py", line 185, in _isna_old
return obj._constructor(obj._data.isna(func=_isna_old))
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 555, in isna
return self.apply("apply", func=func)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 442, in apply
applied = getattr(b, f)(**kwargs)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 390, in apply
result = func(self.values, **kwargs)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/dtypes/missing.py", line 183, in _isna_old
return _isna_ndarraylike_old(obj)
File "/home/sbucc/miniconda3/envs/tf114/lib/python3.6/site-packages/pandas/core/dtypes/missing.py", line 283, in _isna_ndarraylike_old
vec = libmissing.isnaobj_old(values.ravel())
TypeError: Argument 'arr' has incorrect type (expected numpy.ndarray, got Categorical)
This doesn't happen with pandas 0.24.0 or if pd.options.mode.use_inf_as_na = False (default).
Expected Output
no error
Output of pd.show_versions()
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-96-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.0.3
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None