Skip to content

REF: Refactor Grouping class to use delegates/polymorphism #46203

Closed
@NickCrews

Description

@NickCrews

Hi there! Thanks for the work you're doing.

I started trying to write a PR to fix #36327, and that got me into the groupby code. Specifically, the Grouping class in core.groupby.grouper. I was thinking that this class could be ripe for a refactor.

Currently, this class handles lots of different cases, for example if self.grouping_vector is a categorical, or a Grouper object, or a general array-like. Also it deals with multi-indexes. With one class handling all these cases, the methods are littered with if-elses to decide the right code path.

Possible solution: Define an interface for a Grouping, and then each of these different cases implement them differently (eg a CategoricalGrouping, MultiIndexGrouping, etc) Then, instead of calling Grouping.__init__() to make your grouping, you create them with a factory method. I think this would greatly simplify reasoning about how a Grouping works.

This seemed likely that someone has already thought of this, so I wanted to check. I haven't tried implementing it yet, so I might be missing something that makes this a bad idea. If people think this seems like a good idea then I could write a PR for it. Tagging @jbrockmendel since it looks like they have done a lot of refactoring in there already. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions