Skip to content

ENH: Add .pipe to GroupBy objects #17863

Closed
@topper-123

Description

@topper-123

I propose adding a pipe method to GroupBy objects, having the same interface as DataFrame.pipe/Series.pipe.

The use case is reusing a GroupBy object when doing calculations, see use case example below.

Use case

A pipe is useful for succintly reusing Groupby objects in calculations, for example calculating prices given a column of revenue and a columns of quantity sold:

>>> from numpy.random import choice, random
>>> n = 100_000
>>> df = pd.DataFrame({'Store': choice(['Store_1', 'Store_2'], n),
                       'Year': choice(['Year_1', 'Year_2', 'Year_3', 'Year_4'], n),
                       'Revenue': (np.random.random(n)*50+10).round(2),
                       'Quantity': np.random.randint(1, 10, size=n)})
>>> df.head(2)
   Quantity  Revenue    Store    Year
0         2    14.69  Store_1  Year_1
1         9    25.89  Store_2  Year_4

Then having .pipe, we could for example get prices per store/year like so

>>> (df.groupby(['Store', 'Year'])
...    .pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum())
...    .unstack().round(2))
Year     Year_1  Year_2  Year_3  Year_4
Store
Store_1    6.99    6.99    7.01    6.92
Store_2    6.95    6.98    6.97    6.96

Note that the above is vectorized and piping makes the code succint and clear.

Alternatives to .pipe

The alternatives would be:

  1. use .apply,
  2. create a function and call that with a GroupBy object as its argument
  3. Create a price column

Option 1 is not good because of slowness.

A pipe is just syntactic sugar for option 2, but would piping be more readable, especially it you're piping other stuff already.

Creating a concrete calculated column is in some instances the right approach, but in other cases it is better to calculate stuff.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions