-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DEPR: raise deprecation warning in numpy ufuncs on DataFrames if not aligned + fallback to <1.2.0 behaviour #39239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
f5e9871
e02392a
c6f6898
8700321
1a6f257
64b9430
3b66b14
4dcde0e
20be3c7
097de71
f80b780
dabd47f
eaa83ed
81e7c84
4703410
5ed00bb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,79 @@ Fixed regressions | |
|
||
.. --------------------------------------------------------------------------- | ||
|
||
.. _whatsnew_121.ufunc_deprecation: | ||
|
||
Calling NumPy ufuncs on non-aligned DataFrames | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Before pandas 1.2.0, calling a NumPy ufunc on non-aligned DataFrames (or | ||
DataFrame / Series combination) would ignore the indices, only match | ||
the inputs by shape, and use the index/columns of the first DataFrame for | ||
the result: | ||
|
||
.. code-block:: python | ||
|
||
>>> df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[0, 1]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is an incorrect format |
||
... df2 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[1, 2]) | ||
>>> df1 | ||
a b | ||
0 1 3 | ||
1 2 4 | ||
>>> df2 | ||
a b | ||
1 1 3 | ||
2 2 4 | ||
|
||
>>> np.add(df1, df2) | ||
a b | ||
0 2 6 | ||
1 4 8 | ||
|
||
This contrasts with how other pandas operations work, which first align | ||
the inputs: | ||
|
||
.. code-block:: python | ||
|
||
>>> df1 + df2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make an actual ipython block There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to use some plain code-blocks since part of the example is showing old behaviour (or behaviour that will change in the future), and so prefer to use then code-blocks for all examples, for consistency within this section There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we use ipython blocks everywhere, pls do this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would like to change these to be consistent |
||
a b | ||
0 NaN NaN | ||
1 3.0 7.0 | ||
2 NaN NaN | ||
|
||
In pandas 1.2.0, we refactored how NumPy ufuncs are called on DataFrames, and | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
this started to align the inputs first, as happens in other pandas operations | ||
and as it happens for ufuncs called on Series objects. | ||
|
||
For pandas 1.2.1, we restored the previous behaviour to avoid a breaking | ||
change, but the above example of ``np.add(df1, df2)`` with non-aligned inputs | ||
will now to raise a warning, and a future pandas 2.0 release will start | ||
aligning the inputs first (:issue:`39184`). Calling a NumPy ufunc on Series | ||
objects (eg ``np.add(s1, s2)``) already aligns and continues to do so. | ||
|
||
To avoid the warning and keep the current behaviour of ignoring the indices, | ||
convert one of the arguments to a NumPy array: | ||
|
||
.. code-block:: python | ||
|
||
>>> np.add(df1, np.asarray(df2)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use an actual ipython format |
||
a b | ||
0 2 6 | ||
1 4 8 | ||
|
||
To obtain the future behaviour and silence the warning, you can align manually | ||
before passing the arguments to the ufunc: | ||
|
||
.. code-block:: python | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls do not use code-blocks except to show older code. these are so error prone |
||
|
||
>>> df1, df2 = df1.align(df2) | ||
>>> np.add(df1, df2) | ||
a b | ||
0 NaN NaN | ||
1 3.0 7.0 | ||
2 NaN NaN | ||
|
||
.. --------------------------------------------------------------------------- | ||
|
||
.. _whatsnew_121.bug_fixes: | ||
|
||
Bug fixes | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -157,11 +157,67 @@ def array_ufunc(self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any) | |
-------- | ||
numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array_ufunc__ | ||
""" | ||
from pandas.core.frame import DataFrame | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from pandas.core.generic import NDFrame | ||
from pandas.core.internals import BlockManager | ||
|
||
cls = type(self) | ||
|
||
is_ndframe = [isinstance(x, NDFrame) for x in inputs] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why would you do this? simply check is_series. this is amazingly confusing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we have dataframes and series There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes i think its more clear There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that below in this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so rename this to is_series_or_frame i think is more clear There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I renamed it now to
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
is_frame = [isinstance(x, DataFrame) for x in inputs] | ||
|
||
if (sum(is_ndframe) >= 2) and (sum(is_frame) >= 1): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this condition is impossible to reason about. pls make it simpler. you just want to know if you have 2 or more dataframes right? (or series)? if so, just say that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I want to know if at least two alignable objects (DataFrame or Series) and at least one DataFrame, which is what the above line does, and which is what is explained on the line just below. I can try to clarify that comment if something is not clear about that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. try to simplify. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, Jeff, if you don't give me a clue about what exactly is unclear for you or about how you would do it differently, I have no idea how to improve this. The code reflects exactly what I just explained it needs checking, and it is explained in the line below as well. Would eg change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well, the problem that this is getting so complicated that you need to comment. I honestly don't think this is worth doing this much change at this late hour. if you want to do for 1.2.2 or better yet 1.3.ok waiting for the nth change is extremely painful and disruptive. these are supposed to be lightweight backports. this is turning in to a nightmare. this is likely going to be extremely fragile and break again. and will then have to be patched again. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Waiting for 1.2.2 or 1.3 is not going to make this change any simpler, if you don't help me find out what you don't like about it
What is this about?
The changes in this PR is a rather clean additional check in the |
||
# if there are 2 alignable inputs, of which at least 1 is a | ||
# DataFrame -> we would have had no alignment before -> warn that this | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# will align in the future | ||
|
||
# the first frame is what determines the output index/columns in pandas < 1.2 | ||
for x in inputs: | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if isinstance(x, DataFrame): | ||
first_frame = x | ||
break | ||
|
||
# check if the objects are aligned or not | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def is_aligned(frame, other): | ||
if isinstance(other, DataFrame): | ||
return frame._indexed_same(other) | ||
else: | ||
# Series -> match index | ||
return frame.columns.equals(other.index) | ||
|
||
non_aligned = sum( | ||
not is_aligned(first_frame, x) for x in inputs if isinstance(x, NDFrame) | ||
) | ||
|
||
# if at least one is not aligned -> warn and fallback to array behaviour | ||
if non_aligned: | ||
warnings.warn( | ||
"Calling a ufunc on non-aligned DataFrames/Series. Currently, the " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. because the Series behavior is different, this warning could be misleading? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, yes. The most explicit is "non-aligned DataFrames or DataFrame/Series combination" or something like that, but wanted to keep it shorter .. |
||
"indices are ignored and the result takes the index/columns of the " | ||
"first DataFrame. In the future (pandas 2.0), the DataFrames/Series " | ||
"will be aligned before applying the ufunc.\nConvert one of the " | ||
"arguments to a NumPy array (eg 'ufunc(df1, np.asarray(df2)') to keep " | ||
"the current behaviour, or align manually (eg " | ||
"'df1, df2 = df1.align(df2)') before passing to the ufunc to obtain " | ||
"the future behaviour and silence this warning.", | ||
FutureWarning, | ||
stacklevel=3, | ||
) | ||
|
||
# keep the first dataframe of the inputs, other DataFrame/Series is | ||
# converted to array for fallback behaviour | ||
new_inputs = [] | ||
for x in inputs: | ||
if x is first_frame: | ||
new_inputs.append(x) | ||
elif isinstance(x, NDFrame): | ||
new_inputs.append(np.asarray(x)) | ||
else: | ||
new_inputs.append(x) | ||
|
||
# call the ufunc on those transformed inputs | ||
return getattr(ufunc, method)(*new_inputs, **kwargs) | ||
|
||
# for binary ops, use our custom dunder methods | ||
result = maybe_dispatch_ufunc_to_dunder_op(self, ufunc, method, *inputs, **kwargs) | ||
if result is not NotImplemented: | ||
|
Uh oh!
There was an error while loading. Please reload this page.