Description
groupby().rolling()
in master currently constructs the resulting MultiIndex
manually by inserting groupby
keys as the first level(s) and then the original object's Index
as the second level(s).
However, groupby().rolling()
behaves similarly to groupby().transform()
(i.e. maintains the original shape), so should the resulting index align with results of groupby().transform()
?
In [4]: pd.__version__
Out[4]: '1.3.0.dev0+238.g1d196295c'
In [6]: df = pd.DataFrame({'a': [1], 'b': [2]})
In [7]: df
Out[7]:
a b
0 1 2
# Example of the original DataFrame having a regular Index
In [8]: df.groupby('a').rolling(1).sum()
Out[8]:
a b
a
1 0 1.0 2.0
In [16]: df.groupby('a').transform(lambda x: np.sum(x))
Out[16]:
b
0 2
In [9]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2']))
In [9]: df
Out[9]:
a b
label1 label2
idx1 idx2 1 2
# Examples of the original DataFrame having a MultiIndex
In [10]: df.groupby('label1').rolling(1).sum()
Out[10]:
a b
label1 label1 label2
idx1 idx1 idx2 1.0 2.0
In [20]: df.groupby('label1').transform(lambda x: np.sum(x))
Out[20]:
a b
label1 label2
idx1 idx2 1 2
In [11]: df.groupby('a').rolling(1).sum()
Out[11]:
a b
a label1 label2
1 idx1 idx2 1.0 2.0
In [21]: df.groupby('a').transform(lambda x: np.sum(x))
Out[21]:
b
label1 label2
idx1 idx2 2
In [12]: df.groupby(['label1', 'a']).rolling(1).sum()
Out[12]:
b
label1 a label1 label2
idx1 1 idx1 idx2 2.0
In [22]: df.groupby(['label1', 'a']).transform(lambda x: np.sum(x))
Out[22]:
b
label1 label2
idx1 idx2 2
As shown, when the original object as a MultiIndex
, there is consistency of the resulting MultiIndex
for the groupby().rolling()
result but can lead to redundancy. There is lack of consistency of the resulting MultiIndex
for the groupby().transform()
result but looks more convenient.
IMO I prefer the consistent result we have today in groupby().rolling()
but open to thoughts.