Description
Hi there! Thanks for the work you're doing.
I started trying to write a PR to fix #36327, and that got me into the groupby code. Specifically, the Grouping
class in core.groupby.grouper. I was thinking that this class could be ripe for a refactor.
Currently, this class handles lots of different cases, for example if self.grouping_vector
is a categorical, or a Grouper
object, or a general array-like. Also it deals with multi-indexes. With one class handling all these cases, the methods are littered with if-elses to decide the right code path.
Possible solution: Define an interface for a Grouping, and then each of these different cases implement them differently (eg a CategoricalGrouping, MultiIndexGrouping, etc) Then, instead of calling Grouping.__init__()
to make your grouping, you create them with a factory method. I think this would greatly simplify reasoning about how a Grouping works.
This seemed likely that someone has already thought of this, so I wanted to check. I haven't tried implementing it yet, so I might be missing something that makes this a bad idea. If people think this seems like a good idea then I could write a PR for it. Tagging @jbrockmendel since it looks like they have done a lot of refactoring in there already. Thank you!