CLN: Clean groupby/test_function.py #32027

dsaxton · 2020-02-15T15:56:52Z

Some small cleanups (removing unnecessary for loops / adding parameterization)

simonjayhawkins

@dsaxton lgtm pending green. one comment, not blocker but I think may be beneficial to reduce the cognitive load.

simonjayhawkins · 2020-02-15T16:12:00Z

pandas/tests/groupby/test_function.py

+@pytest.mark.parametrize("dtype", [np.int32, np.int64, np.float32, np.float64])
+def test_cummin_cummax(dtype):
+    min_val = (
+        np.iinfo(dtype).min if np.dtype(dtype).kind == "i" else np.finfo(dtype).min


for readability, could just include num_mins and num_max in parameterisation.

Yes, I think this could be better

would now be more difficult if a fixture is used for dtype. (as requested) ( could create a composable min and max fixtures but probably OTT for one test)

jreback · 2020-02-15T16:21:20Z

pandas/tests/groupby/test_function.py

@@ -685,59 +685,53 @@ def test_numpy_compat(func):
        getattr(g, func)(foo=1)


-def test_cummin_cummax():
+@pytest.mark.parametrize("dtype", [np.int32, np.int64, np.float32, np.float64])


we have fixture for these

Which fixture could I use?

I think you could try any_real_dtype and see if the tests pass

Looks like it breaks for the unsigned real dtypes

OK, not sure we want to create another fixture, add a skip in the test for the unsigned or keep the dtype as direct parameterisation and include the min and max in the parameterisation. @jreback?

jreback · 2020-02-15T16:22:22Z

pandas/tests/groupby/test_function.py

-        tm.assert_frame_equal(result, expected)
-        result = df.groupby("A").B.apply(lambda x: x.cummin()).to_frame()
-        tm.assert_frame_equal(result, expected)
+    # cummin


this test might be simpler if you break it into multiple tests

Yes, the test is doing quite a lot. Would it be better to break up in this or a future PR?

just as easy to do here

Broke into two tests, I'm not sure if this looks cleaner though. With a fixture on exact configuration of dtypes it would be nice to split into multiple but without that it gets pretty messy.

This reverts commit 23ad133.

This reverts commit bac1eb5.

jreback · 2020-02-16T16:38:22Z

pandas/tests/groupby/test_function.py


-        result = f(numeric_only=False)
-        tm.assert_index_equal(result.columns, expected_columns)
+    f = getattr(df.groupby("group"), "sum")


just call this directly
df.groupby('group').sum()

jreback · 2020-02-16T16:38:28Z

pandas/tests/groupby/test_function.py

+    result = f()
+    tm.assert_index_equal(result.columns, expected_columns_numeric)
+
+    result = f(numeric_only=False)


Done, I think it's a bit easier to read / understand if we just call these functions directly instead of assigning to f then calling, so changed that as well

jreback · 2020-02-16T16:39:37Z

pandas/tests/groupby/test_function.py

+@pytest.mark.parametrize(
+    "dtype, min_val, max_val",
+    [
+        (np.int32, np.iinfo(np.int32).min, np.iinfo(np.int32).max),


min_val, max_val can all be computed inside the function

jreback · 2020-02-16T16:40:59Z

pandas/tests/groupby/test_function.py

@@ -751,6 +729,65 @@ def test_cummin_cummax():
    expected = base_df.groupby("A").B.apply(lambda x: x.cummin()).to_frame()
    tm.assert_frame_equal(result, expected)

+    # Test nan in entire column


can you split to another test rather than make this test very long

jreback · 2020-02-16T16:41:21Z

pandas/tests/groupby/test_function.py

+    tm.assert_series_equal(result, expected)
+
+
+@pytest.mark.parametrize(


simply make this a fixture at the top of the file rather than repeating

jreback

comment, otherwsie lgtm. ping on green.

jreback · 2020-02-17T17:11:30Z

pandas/tests/groupby/test_function.py

+    params=[np.int32, np.int64, np.float32, np.float64],
+    ids=["np.int32", "np.int64", "np.float32", "np.float64"],
+)
+def numpy_dtypes(request):


can you name this something else, maybe numpy_dtypes_for_minmax; also you can include in the return the iinfo or finfo result (e.g. you would return a tuple); i find this a bit more obvious than doing it in the test function.

Updated based on comments and green

not a fan of returning tuples and unpacking in tests in scenarios where composable fixtures could be used instead and could have more utility. However, naming the fixture numpy_dtypes_for_minmax somewhat limits the reuse of composible fixtures.

diff --git a/pandas/tests/groupby/test_function.py b/pandas/tests/groupby/test_function.py index 6205dfb87..348a6c189 100644 --- a/pandas/tests/groupby/test_function.py +++ b/pandas/tests/groupby/test_function.py @@ -35,15 +35,31 @@ def numpy_dtypes_for_minmax(request): Fixture of numpy dtypes with min and max values used for testing cummin and cummax """ - dtype = request.param - min_val = ( - np.iinfo(dtype).min if np.dtype(dtype).kind == "i" else np.finfo(dtype).min - ) - max_val = ( - np.iinfo(dtype).max if np.dtype(dtype).kind == "i" else np.finfo(dtype).max + return request.param + + +@pytest.fixture() +def dtype_min(numpy_dtypes_for_minmax): + """ + Fixture to return minimum value of dtype. + """ + return ( + np.iinfo(numpy_dtypes_for_minmax).min + if np.dtype(numpy_dtypes_for_minmax).kind == "i" + else np.finfo(numpy_dtypes_for_minmax).min ) - return (dtype, min_val, max_val) + +@pytest.fixture() +def dtype_max(numpy_dtypes_for_minmax): + """ + Fixture to return maximum value of dtype. + """ + return ( + np.iinfo(numpy_dtypes_for_minmax).max + if np.dtype(numpy_dtypes_for_minmax).kind == "i" + else np.finfo(numpy_dtypes_for_minmax).max + ) @pytest.mark.parametrize("agg_func", ["any", "all"]) @@ -704,9 +720,8 @@ def test_numpy_compat(func): reason="https://github.com/pandas-dev/pandas/issues/31992", strict=False, ) -def test_cummin(numpy_dtypes_for_minmax): - dtype = numpy_dtypes_for_minmax[0] - min_val = numpy_dtypes_for_minmax[1] +def test_cummin(numpy_dtypes_for_minmax, dtype_min): + dtype = numpy_dtypes_for_minmax # GH 15048 base_df = pd.DataFrame( @@ -723,8 +738,8 @@ def test_cummin(numpy_dtypes_for_minmax): tm.assert_frame_equal(result, expected) # Test w/ min value for dtype - df.loc[[2, 6], "B"] = min_val - expected.loc[[2, 3, 6, 7], "B"] = min_val + df.loc[[2, 6], "B"] = dtype_min + expected.loc[[2, 3, 6, 7], "B"] = dtype_min result = df.groupby("A").cummin() tm.assert_frame_equal(result, expected) expected = df.groupby("A").B.apply(lambda x: x.cummin()).to_frame() @@ -772,9 +787,8 @@ def test_cummin_all_nan_column(): reason="https://github.com/pandas-dev/pandas/issues/31992", strict=False, ) -def test_cummax(numpy_dtypes_for_minmax): - dtype = numpy_dtypes_for_minmax[0] - max_val = numpy_dtypes_for_minmax[2] +def test_cummax(numpy_dtypes_for_minmax, dtype_max): + dtype = numpy_dtypes_for_minmax # GH 15048 base_df = pd.DataFrame( @@ -791,8 +805,8 @@ def test_cummax(numpy_dtypes_for_minmax): tm.assert_frame_equal(result, expected) # Test w/ max value for dtype - df.loc[[2, 6], "B"] = max_val - expected.loc[[2, 3, 6, 7], "B"] = max_val + df.loc[[2, 6], "B"] = dtype_max + expected.loc[[2, 3, 6, 7], "B"] = dtype_max result = df.groupby("A").cummax() tm.assert_frame_equal(result, expected) expected = df.groupby("A").B.apply(lambda x: x.cummax()).to_frame()

If we take the min and max out of the fixture I'd just say compute inside the test, since the min and max are each only used in one test respectively, so there isn't much value in having them in a fixture anyways

Sorry I wasn't clear. I'm not a fan but no need to change. see #32027 (comment)

WillAyd · 2020-02-20T04:09:27Z

Thanks @dsaxton

CLN: Clean test_function.py

803a166

simonjayhawkins approved these changes Feb 15, 2020

View reviewed changes

simonjayhawkins added Clean Testing pandas testing functions or related to the test suite labels Feb 15, 2020

simonjayhawkins added this to the 1.1 milestone Feb 15, 2020

jreback requested changes Feb 15, 2020

View reviewed changes

Daniel Saxton added 8 commits February 15, 2020 12:35

Move into params

5f56e75

Merge branch 'master' into clean-test

cc6b42c

Merge branch 'master' into clean-test

507bad8

Break into two tests

761c7e6

Remove some more loops

bac1eb5

Break into own test, parametrize

23ad133

Revert "Break into own test, parametrize"

7b2cfcb

This reverts commit 23ad133.

Revert "Remove some more loops"

e8b37c1

This reverts commit bac1eb5.

jreback requested changes Feb 16, 2020

View reviewed changes

Daniel Saxton added 6 commits February 16, 2020 11:44

Call functions directly

fd84996

Make fixture and split out more tests

c3bc093

Fix

6a01a6d

Update fixture

4e02e07

Merge branch 'master' into clean-test

20c1aa7

xfail not strict

a00df9a

jreback requested changes Feb 17, 2020

View reviewed changes

Daniel Saxton added 2 commits February 17, 2020 12:05

Update fixture

7cf1ae7

Merge branch 'master' into clean-test

b923f5e

WillAyd merged commit 60b8f05 into pandas-dev:master Feb 20, 2020

dsaxton deleted the clean-test branch February 20, 2020 14:21

roberthdevries pushed a commit to roberthdevries/pandas that referenced this pull request Mar 2, 2020

CLN: Clean groupby/test_function.py (pandas-dev#32027)

b4a0b92

		tm.assert_series_equal(result, expected)


		@pytest.mark.parametrize(

Uh oh!

CLN: Clean groupby/test_function.py #32027

CLN: Clean groupby/test_function.py #32027

Uh oh!

Conversation

dsaxton commented Feb 15, 2020

Uh oh!

simonjayhawkins left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd commented Feb 20, 2020

Uh oh!

Uh oh!