Closed
Description
Code
import pandas as pd
from datetime import datetime
mi = pd.MultiIndex.from_product([pd.date_range(datetime.today(), periods=2),
["C", "D"]], names=["alpha", "beta"])
df = pd.DataFrame({"foo": [1, 2, 1, 2], "bar": [1, 2, 3, 4]}, index=mi)
result = df.groupby([pd.Grouper(level="alpha", freq='D'), "beta"])
print(len(result), result.ngroups)
# 2 4
print(result.groups)
# {(Timestamp('2020-04-05 00:00:00'), 'C'): MultiIndex([('2020-04-05 15:04:51.580573', 'C')], names=['alpha', 'beta']),
# (Timestamp('2020-04-06 00:00:00'), 'D'): MultiIndex([('2020-04-05 15:04:51.580573', 'D')], names=['alpha', 'beta'])}
Problem description
This issue is an extension of the bug reported in #26326. The PR #26374 resolved the bug for the case of when we have a nested BaseGrouper
. Nonetheless, having a nested BinGrouper
still results in wrong behavior, as can be checked by the above code.
Note that len(result)
is based on len(result.groups)
, and that result.groups
should return the following:
# {(Timestamp('2020-04-05 00:00:00'), 'C'): MultiIndex([('2020-04-05 15:04:51.580573', 'C')], names=['alpha', 'beta']),
# (Timestamp('2020-04-05 00:00:00'), 'D'): MultiIndex([('2020-04-05 15:04:51.580573', 'D')], names=['alpha', 'beta']),
# (Timestamp('2020-04-06 00:00:00'), 'C'): MultiIndex([('2020-04-05 15:04:51.580573', 'C')], names=['alpha', 'beta']),
# (Timestamp('2020-04-06 00:00:00'), 'D'): MultiIndex([('2020-04-05 15:04:51.580573', 'D')], names=['alpha', 'beta'])}
INSTALLED VERSIONS
------------------
commit : 7673357191709036faad361cbb5f31a802703249
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.5.8-1-MANJARO
Version : #1 SMP PREEMPT Thu Mar 5 20:29:51 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.0.dev0+1027.g767335719.dirty
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.6.0
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.15.0
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0