Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
# With pandas 1.5
from pandas import DataFrame, Categorical
df = DataFrame({"x": Categorical([1, 2], categories=[1, 2, 3]), "y": [3, 4]})
df.groupby("x", observed=False).grouper.result_index
# CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
df.groupby("x", observed=False, dropna=False).grouper.result_index
# CategoricalIndex([1, 2], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
# ------------------------------------------------------------------------------------------
# Unexpected result ↑
df.groupby("x", observed=False, dropna=True).grouper.result_index
# CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
# With pandas 1.4.4 and prior
df.groupby("x", observed=False).grouper.result_index
# CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
df.groupby("x", observed=False, dropna=False).grouper.result_index
# CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
df.groupby("x", observed=False, dropna=True).grouper.result_index
# CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category', name='x')
Issue Description
dropna=False
in DataFrame.groupby()
should not affect the results when observed=False
.
Expected Behavior
Expected the behavior with pandas 1.4.4 and prior.
Installed Versions
INSTALLED VERSIONS
commit : 87cfe4e
python : 3.9.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.57.1-microsoft-standard-WSL2
Version : #1 SMP Wed Jul 27 02:20:31 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.0
numpy : 1.23.3
pytz : 2022.1
dateutil : 2.8.2
setuptools : 58.0.0
pip : 22.2.2
Cython : None
pytest : 6.2.5
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.1.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
brotli :
fastparquet : None
fsspec : 2022.02.0
gcsfs : None
matplotlib : 3.5.1
numba : 0.53.1
numexpr : None
odfpy : None
openpyxl : 3.0.8
pandas_gbq : None
pyarrow : 7.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
snappy : None
sqlalchemy : 1.4.28
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None