Skip to content

API: Preferred MultiIndex result for groupby().rolling() for an object with a MultiIndex #38787

Closed
@mroeschke

Description

@mroeschke

groupby().rolling() in master currently constructs the resulting MultiIndex manually by inserting groupby keys as the first level(s) and then the original object's Index as the second level(s).

However, groupby().rolling() behaves similarly to groupby().transform() (i.e. maintains the original shape), so should the resulting index align with results of groupby().transform()?

In [4]: pd.__version__
Out[4]: '1.3.0.dev0+238.g1d196295c'

In [6]: df = pd.DataFrame({'a': [1], 'b': [2]})

In [7]: df
Out[7]:
   a  b
0  1  2

# Example of the original DataFrame having a regular Index
In [8]: df.groupby('a').rolling(1).sum()
Out[8]:
       a    b
a
1 0  1.0  2.0

In [16]: df.groupby('a').transform(lambda x: np.sum(x))
Out[16]:
   b
0  2

In [9]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2']))

In [9]: df
Out[9]:
               a  b
label1 label2
idx1   idx2    1  2

# Examples of the original DataFrame having a MultiIndex
In [10]: df.groupby('label1').rolling(1).sum()
Out[10]:
                        a    b
label1 label1 label2
idx1   idx1   idx2    1.0  2.0

In [20]: df.groupby('label1').transform(lambda x: np.sum(x))
Out[20]:
               a  b
label1 label2
idx1   idx2    1  2

In [11]: df.groupby('a').rolling(1).sum()
Out[11]:
                   a    b
a label1 label2
1 idx1   idx2    1.0  2.0

In [21]: df.groupby('a').transform(lambda x: np.sum(x))
Out[21]:
               b
label1 label2
idx1   idx2    2

In [12]: df.groupby(['label1', 'a']).rolling(1).sum()
Out[12]:
                          b
label1 a label1 label2
idx1   1 idx1   idx2    2.0

In [22]: df.groupby(['label1', 'a']).transform(lambda x: np.sum(x))
Out[22]:
               b
label1 label2
idx1   idx2    2

As shown, when the original object as a MultiIndex, there is consistency of the resulting MultiIndex for the groupby().rolling() result but can lead to redundancy. There is lack of consistency of the resulting MultiIndex for the groupby().transform() result but looks more convenient.

IMO I prefer the consistent result we have today in groupby().rolling() but open to thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorGroupbyWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions