Skip to content

TST/CLN: Consolidate creation of groupby method args #47973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 8, 2022

Conversation

rhshadrach
Copy link
Member

  • closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • [] Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Wasn't sure if it was better to put the helper in __init__ or create common.

@rhshadrach rhshadrach added Testing pandas testing functions or related to the test suite Groupby Clean labels Aug 4, 2022
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rhshadrach lgtm

As a rule I do not see changing tests as necessarily a benefit. Parameterizing most definitely so that we can test more for consistency.

If it was not for the passing of obj to the helper function, I would not even have mentioned it as it could potentially have made a good composite fixture.

@@ -0,0 +1,10 @@
def get_method_args(name, obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a one-liner docstring here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - good idea. I ended up doing a full docstring.

@simonjayhawkins simonjayhawkins added this to the 1.5 milestone Aug 5, 2022
@simonjayhawkins
Copy link
Member

Thanks @rhshadrach lgtm. and no documentation is too much (imo) so that we are more casual contributor friendly. merge when green.

@rhshadrach
Copy link
Member Author

As a rule I do not see changing tests as necessarily a benefit. Parameterizing most definitely so that we can test more for consistency.

no documentation is too much (imo) so that we are more casual contributor friendly

For me, sometimes one of the difficult parts of doing a PR is just figuring out where a test should go. In the same vein as documentation, I think making tests easier to write would be a way to make pandas more contributor friendly. So I do think cleaning up tests where possible would be a benefit.

In fact, I've spent quite some effort trying to figure out ways to better organize and write groupby tests because of this, but rarely do I find a change that's worth doing. The trouble is there are so many ways to parametrize, and how best to do it really depends on the test:

  • the type of object being grouped - i.e. SeriesGroupBy and DataFrameGroupBy
  • the methods being used (e.g. sum, mean, fillna)
  • the arguments (e.g. .groupby(axis=0) and .groupby(axis=1)), especially by
  • the dtypes of the input (including single vs multiindex, and especially categorical)

Various tests parametrize over one or multiple of these, and many more don't parametrize at all but probably should. I've never been able to find a good way to organize tests for parametrizations, but would strongly feel we should refactor if there was one.

@mroeschke mroeschke merged commit dc229e6 into pandas-dev:main Aug 8, 2022
@mroeschke
Copy link
Member

mroeschke commented Aug 8, 2022

Thanks @rhshadrach (failure unrelated).

My 2c: I think test cleanup can be worth it if it's:

  1. Test reorganization/renaming such that it's more intuitive to know where to place things
  2. Simplifying tests & using more pytest idioms such that debugging a failing test is more direct.

I think creating a "groupby testing parameterization matrix" of grouping objects x object dtypes x groupby methods x method arguments would be really nice to have running (although maybe redundant with some existing tests), but it would probably be best for it to be a new testing file instead of augmenting what already exists.

@rhshadrach rhshadrach deleted the groupby_args branch August 9, 2022 21:12
@rhshadrach
Copy link
Member Author

Thanks @simonjayhawkins and @mroeschke for sharing your thoughts.

noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
* TST/CLN: Consolidate creation of groupby method args

* Rename to get_groupby_method_args

Co-authored-by: asv-bot <pandas.benchmarks@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Groupby Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants