Skip to content

Implement DataFrame.__array_ufunc__ #36955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Nov 25, 2020
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
0d725e8
Implement DataFrame.__array_ufunc__
TomAugspurger Oct 6, 2020
c4c1470
remove unnecessary decorator
TomAugspurger Oct 7, 2020
4fcb1a4
Fixup
TomAugspurger Oct 7, 2020
971659e
fixup finalize
TomAugspurger Oct 7, 2020
6bd73dc
whatsnew
TomAugspurger Oct 7, 2020
1085be4
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Oct 7, 2020
0afdf49
fixup
TomAugspurger Oct 7, 2020
2260c83
fixup
TomAugspurger Oct 8, 2020
b3239e2
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Oct 8, 2020
b1d93f5
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Oct 9, 2020
9cfcba1
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Oct 18, 2020
919ebb5
Move to arraylike
TomAugspurger Oct 18, 2020
c73ab12
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Nov 12, 2020
99acb86
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Nov 13, 2020
acfe434
union
TomAugspurger Nov 13, 2020
9a35023
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Nov 14, 2020
2371499
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Nov 20, 2020
d283446
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Nov 22, 2020
816c6dc
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Nov 25, 2020
a6b120a
docstring, typo fixup
jbrockmendel Nov 25, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,8 @@ Other enhancements
- :meth:`Rolling.mean()` and :meth:`Rolling.sum()` use Kahan summation to calculate the mean to avoid numerical problems (:issue:`10319`, :issue:`11645`, :issue:`13254`, :issue:`32761`, :issue:`36031`)
- :meth:`DatetimeIndex.searchsorted`, :meth:`TimedeltaIndex.searchsorted`, :meth:`PeriodIndex.searchsorted`, and :meth:`Series.searchsorted` with datetimelike dtypes will now try to cast string arguments (listlike and scalar) to the matching datetimelike type (:issue:`36346`)
- Added methods :meth:`IntegerArray.prod`, :meth:`IntegerArray.min`, and :meth:`IntegerArray.max` (:issue:`33790`)
- Calling a NumPy ufunc on a ``DataFrame`` with extension types now presrves the extension types when possible (:issue:`23743`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presrves -> preserves

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update :->

- Calling a binary-input NumPy ufunc on multiple ``DataFrame`` objects now aligns, matching the behavior of binary operations and ufuncs on ``Series`` (:issue:`23743`).
- Where possible :meth:`RangeIndex.difference` and :meth:`RangeIndex.symmetric_difference` will return :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`36564`)

.. _whatsnew_120.api_breaking.python:
Expand Down Expand Up @@ -289,6 +291,7 @@ Deprecations
- Deprecated :meth:`Index.is_all_dates` (:issue:`27744`)
- Deprecated automatic alignment on comparison operations between :class:`DataFrame` and :class:`Series`, do ``frame, ser = frame.align(ser, axis=1, copy=False)`` before e.g. ``frame == ser`` (:issue:`28759`)
- :meth:`Rolling.count` with ``min_periods=None`` will default to the size of the window in a future version (:issue:`31302`)
- Using "outer" ufuncs on DataFrames to return 4d ndarray is now deprecated. Convert to an ndarray first (:issue:`23743`)
- Deprecated slice-indexing on timezone-aware :class:`DatetimeIndex` with naive ``datetime`` objects, to match scalar indexing behavior (:issue:`36148`)
- :meth:`Index.ravel` returning a ``np.ndarray`` is deprecated, in the future this will return a view on the same index (:issue:`19956`)

Expand Down Expand Up @@ -473,6 +476,7 @@ Other
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` incorrectly raising ``AssertionError`` instead of ``ValueError`` when invalid parameter combinations are passed (:issue:`36045`)
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` with numeric values and string ``to_replace`` (:issue:`34789`)
- Fixed metadata propagation in the :class:`Series.dt` accessor (:issue:`28283`)
- Fixed metadata propagation in :meth:`Series.abs` and ufuncs called on Series (:issue:`28283`)
- Bug in :meth:`Index.union` behaving differently depending on whether operand is a :class:`Index` or other list-like (:issue:`36384`)

.. ---------------------------------------------------------------------------
Expand Down
1 change: 1 addition & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -410,6 +410,7 @@ class DataFrame(NDFrame):

_internal_names_set = {"columns", "index"} | NDFrame._internal_names_set
_typ = "dataframe"
_HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)

@property
def _constructor(self) -> Type[DataFrame]:
Expand Down
132 changes: 130 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,11 @@
from pandas.core.dtypes.missing import isna, notna

import pandas as pd
from pandas.core import missing, nanops
from pandas.core import missing, nanops, ops
import pandas.core.algorithms as algos
from pandas.core.base import PandasObject, SelectionMixin
import pandas.core.common as com
from pandas.core.construction import create_series_with_explicit_dtype
from pandas.core.construction import create_series_with_explicit_dtype, extract_array
from pandas.core.flags import Flags
from pandas.core.indexes import base as ibase
from pandas.core.indexes.api import Index, MultiIndex, RangeIndex, ensure_index
Expand Down Expand Up @@ -1912,6 +1912,134 @@ def __array_wrap__(
self, method="__array_wrap__"
)

def __array_ufunc__(
self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any
):
cls = type(self)

# for binary ops, use our custom dunder methods
result = ops.maybe_dispatch_ufunc_to_dunder_op(
self, ufunc, method, *inputs, **kwargs
)
if result is not NotImplemented:
return result

# Determine if we should defer.
no_defer = (np.ndarray.__array_ufunc__, cls.__array_ufunc__)

for item in inputs:
higher_priority = (
hasattr(item, "__array_priority__")
and item.__array_priority__ > self.__array_priority__
)
has_array_ufunc = (
hasattr(item, "__array_ufunc__")
and type(item).__array_ufunc__ not in no_defer
and not isinstance(item, self._HANDLED_TYPES)
)
if higher_priority or has_array_ufunc:
return NotImplemented

# align all the inputs.
types = tuple(type(x) for x in inputs)
alignable = [x for x, t in zip(inputs, types) if issubclass(t, NDFrame)]

if len(alignable) > 1:
# This triggers alignment.
# At the moment, there aren't any ufuncs with more than two inputs
# so this ends up just being x1.index | x2.index, but we write
# it to handle *args.

if len(set(types)) > 1:
# We currently don't handle ufunc(DataFrame, Series)
# well. Previously this raised an internal ValueError. We might
# support it someday, so raise a NotImplementedError.
raise NotImplementedError(
"Cannot apply ufunc {} to mixed DataFrame and Series "
"inputs.".format(ufunc)
)
axes = self.axes
for obj in alignable[1:]:
# this relies on the fact that we aren't handling mixed
# series / frame ufuncs.
for i, (ax1, ax2) in enumerate(zip(axes, obj.axes)):
axes[i] = ax1 | ax2

reconstruct_axes = dict(zip(self._AXIS_ORDERS, axes))
inputs = tuple(
x.reindex(**reconstruct_axes) if issubclass(t, NDFrame) else x
for x, t in zip(inputs, types)
)
else:
reconstruct_axes = dict(zip(self._AXIS_ORDERS, self.axes))

if self.ndim == 1:
names = [getattr(x, "name") for x in inputs if hasattr(x, "name")]
name = names[0] if len(set(names)) == 1 else None
reconstruct_kwargs = {"name": name}
else:
reconstruct_kwargs = {}

def reconstruct(result):
if lib.is_scalar(result):
return result
if result.ndim != self.ndim:
if method == "outer":
if self.ndim == 2:
# we already deprecated for Series
msg = (
"outer method for ufunc {} is not implemented on "
"pandas objects. Returning an ndarray, but in the "
"future this will raise a 'NotImplementedError'. "
"Consider explicitly converting the DataFrame "
"to an array with '.to_numpy()' first."
)
warnings.warn(msg.format(ufunc), FutureWarning, stacklevel=3)
return result
raise NotImplementedError
return result
if isinstance(result, BlockManager):
# we went through BlockManager.apply
result = self._constructor(result, **reconstruct_kwargs, copy=False)
else:
# we converted an array, lost our axes
result = self._constructor(
result, **reconstruct_axes, **reconstruct_kwargs, copy=False
)
# TODO: When we support multiple values in __finalize__, this
# should pass alignable to `__fianlize__` instead of self.
# Then `np.add(a, b)` would consider attrs from both a and b
# when a and b are NDFrames.
if len(alignable) == 1:
result = result.__finalize__(self)
return result

if self.ndim > 1 and (
len(inputs) > 1 or ufunc.nout > 1 # type: ignore[attr-defined]
):
# Just give up on preserving types in the complex case.
# In theory we could preserve them for them.
# * nout>1 is doable if BlockManager.apply took nout and
# returned a Tuple[BlockManager].
# * len(inputs) > 1 is doable when we know that we have
# aligned blocks / dtypes.
inputs = tuple(np.asarray(x) for x in inputs)
result = getattr(ufunc, method)(*inputs)
elif self.ndim == 1:
# ufunc(series, ...)
inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
result = getattr(ufunc, method)(*inputs, **kwargs)
else:
# ufunc(dataframe)
mgr = inputs[0]._mgr
result = mgr.apply(getattr(ufunc, method))

if ufunc.nout > 1: # type: ignore[attr-defined]
result = tuple(reconstruct(x) for x in result)
else:
result = reconstruct(result)
return result

# ideally we would define this to avoid the getattr checks, but
# is slower
# @property
Expand Down
77 changes: 1 addition & 76 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
"""

_typ = "series"
_HANDLED_TYPES = (Index, ExtensionArray, np.ndarray)

_name: Label
_metadata: List[str] = ["name"]
Expand Down Expand Up @@ -681,82 +682,6 @@ def view(self, dtype=None) -> "Series":

# ----------------------------------------------------------------------
# NDArray Compat
_HANDLED_TYPES = (Index, ExtensionArray, np.ndarray)

def __array_ufunc__(
self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any
):
# TODO: handle DataFrame
cls = type(self)

# for binary ops, use our custom dunder methods
result = ops.maybe_dispatch_ufunc_to_dunder_op(
self, ufunc, method, *inputs, **kwargs
)
if result is not NotImplemented:
return result

# Determine if we should defer.
no_defer = (np.ndarray.__array_ufunc__, cls.__array_ufunc__)

for item in inputs:
higher_priority = (
hasattr(item, "__array_priority__")
and item.__array_priority__ > self.__array_priority__
)
has_array_ufunc = (
hasattr(item, "__array_ufunc__")
and type(item).__array_ufunc__ not in no_defer
and not isinstance(item, self._HANDLED_TYPES)
)
if higher_priority or has_array_ufunc:
return NotImplemented

# align all the inputs.
names = [getattr(x, "name") for x in inputs if hasattr(x, "name")]
types = tuple(type(x) for x in inputs)
# TODO: dataframe
alignable = [x for x, t in zip(inputs, types) if issubclass(t, Series)]

if len(alignable) > 1:
# This triggers alignment.
# At the moment, there aren't any ufuncs with more than two inputs
# so this ends up just being x1.index | x2.index, but we write
# it to handle *args.
index = alignable[0].index
for s in alignable[1:]:
index |= s.index
inputs = tuple(
x.reindex(index) if issubclass(t, Series) else x
for x, t in zip(inputs, types)
)
else:
index = self.index

inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
result = getattr(ufunc, method)(*inputs, **kwargs)

name = names[0] if len(set(names)) == 1 else None

def construct_return(result):
if lib.is_scalar(result):
return result
elif result.ndim > 1:
# e.g. np.subtract.outer
if method == "outer":
# GH#27198
raise NotImplementedError
return result
return self._constructor(result, index=index, name=name, copy=False)

if type(result) is tuple:
# multiple return values
return tuple(construct_return(x) for x in result)
elif method == "at":
# no return value
return None
else:
return construct_return(result)

def __array__(self, dtype=None) -> np.ndarray:
"""
Expand Down
111 changes: 111 additions & 0 deletions pandas/tests/frame/test_ufunc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
import numpy as np
import pytest

import pandas as pd
import pandas._testing as tm

dtypes = [
"int64",
"Int64",
dict(A="int64", B="Int64"),
]


@pytest.mark.parametrize("dtype", dtypes)
def test_unary_unary(dtype):
# unary input, unary output
values = np.array([[-1, -1], [1, 1]], dtype="int64")
df = pd.DataFrame(values, columns=["A", "B"], index=["a", "b"]).astype(dtype=dtype)
result = np.positive(df)
expected = pd.DataFrame(
np.positive(values), index=df.index, columns=df.columns
).astype(dtype)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("dtype", dtypes)
def test_unary_binary(dtype):
# unary input, binary output
if pd.api.types.is_extension_array_dtype(dtype) or isinstance(dtype, dict):
pytest.xfail(reason="Extension / mixed with multiple outuputs not implemented.")

values = np.array([[-1, -1], [1, 1]], dtype="int64")
df = pd.DataFrame(values, columns=["A", "B"], index=["a", "b"]).astype(dtype=dtype)
result_pandas = np.modf(df)
assert isinstance(result_pandas, tuple)
assert len(result_pandas) == 2
expected_numpy = np.modf(values)

for result, b in zip(result_pandas, expected_numpy):
expected = pd.DataFrame(b, index=df.index, columns=df.columns)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("dtype", dtypes)
def test_binary_input_dispatch_binop(dtype):
# binop ufuncs are dispatched to our dunder methods.
values = np.array([[-1, -1], [1, 1]], dtype="int64")
df = pd.DataFrame(values, columns=["A", "B"], index=["a", "b"]).astype(dtype=dtype)
result = np.add(df, df)
expected = pd.DataFrame(
np.add(values, values), index=df.index, columns=df.columns
).astype(dtype)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("dtype_a", dtypes)
@pytest.mark.parametrize("dtype_b", dtypes)
def test_binary_input_aligns_columns(dtype_a, dtype_b):
if (
pd.api.types.is_extension_array_dtype(dtype_a)
or isinstance(dtype_a, dict)
or pd.api.types.is_extension_array_dtype(dtype_b)
or isinstance(dtype_b, dict)
):
pytest.xfail(reason="Extension / mixed with multiple inputs not implemented.")

df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]}).astype(dtype_a)

if isinstance(dtype_a, dict) and isinstance(dtype_b, dict):
dtype_b["C"] = dtype_b.pop("B")

df2 = pd.DataFrame({"A": [1, 2], "C": [3, 4]}).astype(dtype_b)
result = np.heaviside(df1, df2)
expected = np.heaviside(
np.array([[1, 3, np.nan], [2, 4, np.nan]]),
np.array([[1, np.nan, 3], [2, np.nan, 4]]),
)
expected = pd.DataFrame(expected, index=[0, 1], columns=["A", "B", "C"])
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("dtype", dtypes)
def test_binary_input_aligns_index(dtype):
if pd.api.types.is_extension_array_dtype(dtype) or isinstance(dtype, dict):
pytest.xfail(reason="Extension / mixed with multiple inputs not implemented.")
df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, index=["a", "b"]).astype(dtype)
df2 = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, index=["a", "c"]).astype(dtype)
result = np.heaviside(df1, df2)
expected = np.heaviside(
np.array([[1, 3], [3, 4], [np.nan, np.nan]]),
np.array([[1, 3], [np.nan, np.nan], [3, 4]]),
)
# TODO(FloatArray): this will be Float64Dtype.
expected = pd.DataFrame(expected, index=["a", "b", "c"], columns=["A", "B"])
tm.assert_frame_equal(result, expected)


def test_binary_frame_series_raises():
# We don't currently implement
df = pd.DataFrame({"A": [1, 2]})
with pytest.raises(NotImplementedError, match="logaddexp"):
np.logaddexp(df, df["A"])

with pytest.raises(NotImplementedError, match="logaddexp"):
np.logaddexp(df["A"], df)


def test_frame_outer_deprecated():
df = pd.DataFrame({"A": [1, 2]})
with tm.assert_produces_warning(FutureWarning):
np.subtract.outer(df, df)
4 changes: 2 additions & 2 deletions pandas/tests/generic/test_duplicate_labels.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ def test_construction_ok(self, cls, data):
operator.methodcaller("add", 1),
operator.methodcaller("rename", str.upper),
operator.methodcaller("rename", "name"),
pytest.param(operator.methodcaller("abs"), marks=not_implemented),
# TODO: test np.abs
operator.methodcaller("abs"),
np.abs,
],
)
def test_preserved_series(self, func):
Expand Down
Loading