Skip to content

API: Add pipe method to GroupBy objects #10353

Closed
@ghl3

Description

@ghl3

Extend the new "pipe" protocol to GroupBy objects to allow for piping of a wider class of functions. Currently, one can only create pipes that chain together objects inheriting from NDFrame. But the concept of piping is general and could be extended to other pandas objects, specifically anything inheriting from GroupBy.

The use case is to write pipe that allow one to freely transform back-and-forth between NDFrames and GroupBy objects. Example:

df = DataFrame({A: [...], B: [...]})

def f(dfgb):
    return dfgb['B'].value_counts()

def g(srs):
    return srs * 2

grouped = df.groupby('A')

grouped.pipe(f).pipe(g)

Note that these transformations are transformations are

  • GroupBy -> Series
  • Series -> Series
    and the chain seamlessly switches from a GroupBy.pipe to a NDFrame.pipe

There are a few ways to implement this. A simple way is to break out the core functionality of "pipe" into a pure function and then to call that function in any method implementation of pipe. Another way is to think of piping as a mix-in trait, put it as a method in a base class, and then mix that base class into any class that wants to implement pipe-ability. I have no strong preference between these options, and I'm open to other implementations that may be more inline with Pandas' design goals or the long-term vision of the "pipe" concept.

A strawman implementation of the first implementation suggestion can be found here:
master...ghl3:groupby-pipe

CC
@TomAugspurger
@shoyer

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions