Description
Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
df_test = pd.DataFrame()
df_test['A'] = pd.Series(np.arange(0,2), dtype='category').cat.set_categories(list(range(0,3)))
df_test['B'] = pd.Series(np.arange(10,12), dtype='category').cat.set_categories(list(range(10,13)))
print("Test DF:")
print(df_test)
print("\nThe following are as expected, unobserved categories have size = 0:")
print(df_test.groupby('A').size())
print(df_test.groupby('B').size())
print("\nThe following does not consider categories, I would expect 9 result lines here:")
print(df_test.groupby(['A','B']).size())
print("\nExpected:")
print(pd.DataFrame({'A':list(range(0,3))*3, 'B':list(range(10,13))*3, '':[1]*2+[0]*7 }).set_index(['A','B']))
Problem description
groupby([cols]) gives back a result for all categories if only one column that is categorical is provided (e.g. ['A']), but it only shows the observed combinations if multiple categorical columns are provided ['A', 'B'], regardless of the setting of observed. I would expect that I would get a result for all combinations of the categorical columns.
Expected Output
A result for all combinations of the categorical categories of the groupby columns. For the example above:
A B
0 10 1
1 11 1
2 12 0
0 10 0
1 11 0
2 12 0
0 10 0
1 11 0
2 12 0
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.24.2
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.3
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.7.0
gcsfs: None