Skip to content

CLN: Clean groupby/test_function.py #32027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 20, 2020
Merged

CLN: Clean groupby/test_function.py #32027

merged 17 commits into from
Feb 20, 2020

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Feb 15, 2020

Some small cleanups (removing unnecessary for loops / adding parameterization)

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsaxton lgtm pending green. one comment, not blocker but I think may be beneficial to reduce the cognitive load.

@pytest.mark.parametrize("dtype", [np.int32, np.int64, np.float32, np.float64])
def test_cummin_cummax(dtype):
min_val = (
np.iinfo(dtype).min if np.dtype(dtype).kind == "i" else np.finfo(dtype).min
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for readability, could just include num_mins and num_max in parameterisation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this could be better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would now be more difficult if a fixture is used for dtype. (as requested) ( could create a composable min and max fixtures but probably OTT for one test)

@simonjayhawkins simonjayhawkins added Clean Testing pandas testing functions or related to the test suite labels Feb 15, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1 milestone Feb 15, 2020
@@ -685,59 +685,53 @@ def test_numpy_compat(func):
getattr(g, func)(foo=1)


def test_cummin_cummax():
@pytest.mark.parametrize("dtype", [np.int32, np.int64, np.float32, np.float64])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have fixture for these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which fixture could I use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could try any_real_dtype and see if the tests pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it breaks for the unsigned real dtypes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, not sure we want to create another fixture, add a skip in the test for the unsigned or keep the dtype as direct parameterisation and include the min and max in the parameterisation. @jreback?

tm.assert_frame_equal(result, expected)
result = df.groupby("A").B.apply(lambda x: x.cummin()).to_frame()
tm.assert_frame_equal(result, expected)
# cummin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test might be simpler if you break it into multiple tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the test is doing quite a lot. Would it be better to break up in this or a future PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just as easy to do here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broke into two tests, I'm not sure if this looks cleaner though. With a fixture on exact configuration of dtypes it would be nice to split into multiple but without that it gets pretty messy.


result = f(numeric_only=False)
tm.assert_index_equal(result.columns, expected_columns)
f = getattr(df.groupby("group"), "sum")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call this directly
df.groupby('group').sum()

result = f()
tm.assert_index_equal(result.columns, expected_columns_numeric)

result = f(numeric_only=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I think it's a bit easier to read / understand if we just call these functions directly instead of assigning to f then calling, so changed that as well

@pytest.mark.parametrize(
"dtype, min_val, max_val",
[
(np.int32, np.iinfo(np.int32).min, np.iinfo(np.int32).max),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min_val, max_val can all be computed inside the function

@@ -751,6 +729,65 @@ def test_cummin_cummax():
expected = base_df.groupby("A").B.apply(lambda x: x.cummin()).to_frame()
tm.assert_frame_equal(result, expected)

# Test nan in entire column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you split to another test rather than make this test very long

tm.assert_series_equal(result, expected)


@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply make this a fixture at the top of the file rather than repeating

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment, otherwsie lgtm. ping on green.

params=[np.int32, np.int64, np.float32, np.float64],
ids=["np.int32", "np.int64", "np.float32", "np.float64"],
)
def numpy_dtypes(request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you name this something else, maybe numpy_dtypes_for_minmax; also you can include in the return the iinfo or finfo result (e.g. you would return a tuple); i find this a bit more obvious than doing it in the test function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated based on comments and green

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of returning tuples and unpacking in tests in scenarios where composable fixtures could be used instead and could have more utility. However, naming the fixture numpy_dtypes_for_minmax somewhat limits the reuse of composible fixtures.

diff --git a/pandas/tests/groupby/test_function.py b/pandas/tests/groupby/test_function.py
index 6205dfb87..348a6c189 100644
--- a/pandas/tests/groupby/test_function.py
+++ b/pandas/tests/groupby/test_function.py
@@ -35,15 +35,31 @@ def numpy_dtypes_for_minmax(request):
     Fixture of numpy dtypes with min and max values used for testing
     cummin and cummax
     """
-    dtype = request.param
-    min_val = (
-        np.iinfo(dtype).min if np.dtype(dtype).kind == "i" else np.finfo(dtype).min
-    )
-    max_val = (
-        np.iinfo(dtype).max if np.dtype(dtype).kind == "i" else np.finfo(dtype).max
+    return request.param
+
+
+@pytest.fixture()
+def dtype_min(numpy_dtypes_for_minmax):
+    """
+    Fixture to return minimum value of dtype.
+    """
+    return (
+        np.iinfo(numpy_dtypes_for_minmax).min
+        if np.dtype(numpy_dtypes_for_minmax).kind == "i"
+        else np.finfo(numpy_dtypes_for_minmax).min
     )
 
-    return (dtype, min_val, max_val)
+
+@pytest.fixture()
+def dtype_max(numpy_dtypes_for_minmax):
+    """
+    Fixture to return maximum value of dtype.
+    """
+    return (
+        np.iinfo(numpy_dtypes_for_minmax).max
+        if np.dtype(numpy_dtypes_for_minmax).kind == "i"
+        else np.finfo(numpy_dtypes_for_minmax).max
+    )
 
 
 @pytest.mark.parametrize("agg_func", ["any", "all"])
@@ -704,9 +720,8 @@ def test_numpy_compat(func):
     reason="https://github.com/pandas-dev/pandas/issues/31992",
     strict=False,
 )
-def test_cummin(numpy_dtypes_for_minmax):
-    dtype = numpy_dtypes_for_minmax[0]
-    min_val = numpy_dtypes_for_minmax[1]
+def test_cummin(numpy_dtypes_for_minmax, dtype_min):
+    dtype = numpy_dtypes_for_minmax
 
     # GH 15048
     base_df = pd.DataFrame(
@@ -723,8 +738,8 @@ def test_cummin(numpy_dtypes_for_minmax):
     tm.assert_frame_equal(result, expected)
 
     # Test w/ min value for dtype
-    df.loc[[2, 6], "B"] = min_val
-    expected.loc[[2, 3, 6, 7], "B"] = min_val
+    df.loc[[2, 6], "B"] = dtype_min
+    expected.loc[[2, 3, 6, 7], "B"] = dtype_min
     result = df.groupby("A").cummin()
     tm.assert_frame_equal(result, expected)
     expected = df.groupby("A").B.apply(lambda x: x.cummin()).to_frame()
@@ -772,9 +787,8 @@ def test_cummin_all_nan_column():
     reason="https://github.com/pandas-dev/pandas/issues/31992",
     strict=False,
 )
-def test_cummax(numpy_dtypes_for_minmax):
-    dtype = numpy_dtypes_for_minmax[0]
-    max_val = numpy_dtypes_for_minmax[2]
+def test_cummax(numpy_dtypes_for_minmax, dtype_max):
+    dtype = numpy_dtypes_for_minmax
 
     # GH 15048
     base_df = pd.DataFrame(
@@ -791,8 +805,8 @@ def test_cummax(numpy_dtypes_for_minmax):
     tm.assert_frame_equal(result, expected)
 
     # Test w/ max value for dtype
-    df.loc[[2, 6], "B"] = max_val
-    expected.loc[[2, 3, 6, 7], "B"] = max_val
+    df.loc[[2, 6], "B"] = dtype_max
+    expected.loc[[2, 3, 6, 7], "B"] = dtype_max
     result = df.groupby("A").cummax()
     tm.assert_frame_equal(result, expected)
     expected = df.groupby("A").B.apply(lambda x: x.cummax()).to_frame()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we take the min and max out of the fixture I'd just say compute inside the test, since the min and max are each only used in one test respectively, so there isn't much value in having them in a fixture anyways

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear. I'm not a fan but no need to change. see #32027 (comment)

@WillAyd WillAyd merged commit 60b8f05 into pandas-dev:master Feb 20, 2020
@WillAyd
Copy link
Member

WillAyd commented Feb 20, 2020

Thanks @dsaxton

@dsaxton dsaxton deleted the clean-test branch February 20, 2020 14:21
roberthdevries pushed a commit to roberthdevries/pandas that referenced this pull request Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants