-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REGR: ufunc with DataFrame input not passing all kwargs #40878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 12 commits
138d9d2
3b492e4
625ce36
6bf0da0
7c9a63f
cfb0bcd
58f7399
4590926
d58677e
9c0b96c
e711789
a2da4fc
5c57727
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -357,7 +357,7 @@ def reconstruct(result): | |||||
# * len(inputs) > 1 is doable when we know that we have | ||||||
# aligned blocks / dtypes. | ||||||
inputs = tuple(np.asarray(x) for x in inputs) | ||||||
result = getattr(ufunc, method)(*inputs) | ||||||
result = getattr(ufunc, method)(*inputs, **kwargs) | ||||||
elif self.ndim == 1: | ||||||
# ufunc(series, ...) | ||||||
inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs) | ||||||
|
@@ -367,7 +367,7 @@ def reconstruct(result): | |||||
if method == "__call__": | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(untested, but if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or maybe more simply check for any There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I quickly pushed this edit myself, as it fixes the xfail case in the tests you added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for adding this! |
||||||
# for np.<ufunc>(..) calls | ||||||
mgr = inputs[0]._mgr | ||||||
result = mgr.apply(getattr(ufunc, method)) | ||||||
result = mgr.apply(getattr(ufunc, method), **kwargs) | ||||||
else: | ||||||
# otherwise specific ufunc methods (eg np.<ufunc>.accumulate(..)) | ||||||
# Those can have an axis keyword and thus can't be called block-by-block | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
from functools import partial | ||
|
||
import numpy as np | ||
import pytest | ||
|
||
|
@@ -60,6 +62,46 @@ def test_binary_input_dispatch_binop(dtype): | |
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"func,arg,expected", | ||
[ | ||
(np.add, 1, [2, 3, 4, 5]), | ||
( | ||
partial(np.add, where=[[False, True], [True, False]]), | ||
np.array([[1, 1], [1, 1]]), | ||
[0, 3, 4, 0], | ||
), | ||
(np.power, np.array([[1, 1], [2, 2]]), [1, 2, 9, 16]), | ||
(np.subtract, 2, [-1, 0, 1, 2]), | ||
( | ||
partial(np.negative, where=np.array([[False, True], [True, False]])), | ||
None, | ||
[0, -2, -3, 0], | ||
), | ||
], | ||
) | ||
def test_ufunc_passes_args(func, arg, expected, request): | ||
# GH#40662 | ||
arr = np.array([[1, 2], [3, 4]]) | ||
df = pd.DataFrame(arr) | ||
result_inplace = np.zeros_like(arr) | ||
# 1-argument ufunc | ||
if arg is None: | ||
mark = pytest.mark.xfail( | ||
reason="Result becomes transposed with block-wise apply of ufunc, #40878" | ||
) | ||
request.node.add_marker(mark) | ||
result = func(df, out=result_inplace) | ||
else: | ||
result = func(df, arg, out=result_inplace) | ||
|
||
expected = np.array(expected).reshape(2, 2) | ||
tm.assert_numpy_array_equal(result_inplace, expected) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we have any documentation that out is actually a ndarray? this is a very strange result. At the very least document this, and let's open an issue. This should work with out=DataFrame, or simply raise (preferred) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep. I thought this was strange. #40662 (comment) should we defer to 1.2.5 to allow more discussion? or put 1.2.4 release on hold for a day or two? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah let's defer this. i dont' think this is the correct behavior and we should actually fix it (which may mean that we simply do this for 1.3) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. moving to 1.2.5 (if we do it) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This does raise when passing a DataFrame to (to me this seems correct behaviour, and thus I don't think this needs to hold up merging this for the release) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
how is this correct in any way? this was a bug from the original impl. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That was not a bug, that's how numpy functions work: they coerce array-like input to arrays. So whether one of the arguments was a DataFrame vs an ndarray, did not have any effect on allowing an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok fine. how about a follow to make this really clear (in docs) on master. I would however also deprecate / remove this as it not intuitive at all (e.g. we do not have an out argument anywhere else) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This keyword is not in a pandas function or method, but in a NumPy function (which has There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i know, but this is very confusing if someone is doing this (and doesnt; realize it). |
||
|
||
expected = pd.DataFrame(expected) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
@pytest.mark.parametrize("dtype_a", dtypes) | ||
@pytest.mark.parametrize("dtype_b", dtypes) | ||
def test_binary_input_aligns_columns(request, dtype_a, dtype_b): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the else clause is reached for a DataFrame when not
(len(inputs) > 1 or ufunc.nout > 1)
.The
result = mgr.apply(getattr(ufunc, method))
there for__call__
also fails to pass along **kwargsIs it straightforward to add to the paramterised here to exercise that path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think so...will let you know if I run into any issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test, but had to xfail because
out
is not written to correctly. The written result becomes transposed with the block-wise apply, I'm guessing because of the following relationship:gives
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, basically when
out
is specified, I think we should simply not callmgr.apply
, see #39275, #39260 for a recent similar case where certain additional keyword arguments cannot be handled on a block-by-block case.