Skip to content

ENH: Support ExtensionArray operators via a mixin #21261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 29, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,27 @@ New features

.. _whatsnew_0240.enhancements.other:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this label belongs to the title below your added content

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a ref here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

``ExtensionArray`` operator support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

An ``ExtensionArray`` subclass consisting of objects that have arithmetic and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include most of this in extending.rst as well?

It'd be good to mention the performance implications of using these mixins.

comparison operators defined on the underlying objects can easily support
those operators on the ``ExtensionArray``, and therefore the operators
on ``Series`` built on those ``ExtensionArray`` classes will work as expected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above (use bullets / numbers)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Two new mixin classes, :class:`ExtensionScalarArithmeticMixin` and
:class:`ExtensionScalarComparisonMixin`, support this capability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a ref to the extending.rst doc section for this (e.g. see docs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

If developing an ``ExtensionArray`` subclass, for example ``MyExtensionArray``,
simply include ``ExtensionScalarArithmeticMixin`` and/or
``ExtensionScalarComparisonMixin`` as parent classes of ``MyExtensionArray``
as follows:

.. code-block:: python

class MyExtensionArray(ExtensionArray, ExtensionScalarArithmeticMixin,
ExtensionScalarComparisonMixin):


Other Enhancements
^^^^^^^^^^^^^^^^^^
- :func:`to_datetime` now supports the ``%Z`` and ``%z`` directive when passed into ``format`` (:issue:`13486`)
Expand Down
5 changes: 4 additions & 1 deletion pandas/api/extensions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,8 @@
register_index_accessor,
register_series_accessor)
from pandas.core.algorithms import take # noqa
from pandas.core.arrays.base import ExtensionArray # noqa
from pandas.core.arrays.base import (ExtensionArray, # noqa
ExtensionScalarArithmeticMixin,
ExtensionScalarComparisonMixin,
ExtensionScalarOpsMixin)
from pandas.core.dtypes.dtypes import ExtensionDtype # noqa
12 changes: 11 additions & 1 deletion pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,8 @@ def observed(request):
'__mul__', '__rmul__',
'__floordiv__', '__rfloordiv__',
'__truediv__', '__rtruediv__',
'__pow__', '__rpow__']
'__pow__', '__rpow__',
'__mod__', '__rmod__']
if not PY3:
_all_arithmetic_operators.extend(['__div__', '__rdiv__'])

Expand All @@ -96,6 +97,15 @@ def all_arithmetic_operators(request):
return request.param


@pytest.fixture(params=['__eq__', '__ne__', '__le__',
'__lt__', '__ge__', '__gt__'])
def all_compare_operators(request):
"""
Fixture for dunder names for common compare operations
"""
return request.param


@pytest.fixture(params=[None, 'gzip', 'bz2', 'zip',
pytest.param('xz', marks=td.skip_if_no_lzma)])
def compression(request):
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/arrays/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
from .base import ExtensionArray # noqa
from .base import (ExtensionArray, # noqa
ExtensionScalarArithmeticMixin,
ExtensionScalarComparisonMixin,
ExtensionScalarOpsMixin)
from .categorical import Categorical # noqa
128 changes: 128 additions & 0 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,15 @@
"""
import numpy as np

import operator

from pandas.errors import AbstractMethodError
from pandas.compat.numpy import function as nv
from pandas.compat import set_function_name, PY3
from pandas.core.dtypes.common import (
is_extension_array_dtype,
is_list_like)
from pandas.core import ops

_not_implemented_message = "{} does not implement {}."

Expand Down Expand Up @@ -610,3 +617,124 @@ def _ndarray_values(self):
used for interacting with our indexers.
"""
return np.array(self)


class ExtensionScalarOpsMixin(object):
"""
A base class for the mixins for different operators.
Can also be used to define an individual method for a specific
operator using the class method create_method()
"""

@classmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line above this for PEP8?

Copy link
Contributor Author

@Dr-Irv Dr-Irv Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It passed the PEP8 tests....... But I will add it in.

def _create_method(cls, op, coerce_to_dtype=True):
"""
A class method that returns a method that will correspond to an
operator for an ExtensionArray subclass, by dispatching to the
relevant operator defined on the individual elements of the
ExtensionArray.

Parameters
----------
op: An operator that takes arguments op(a, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use numpydoc standard formatting?

op : function
    An operator that takes arguments op(a, b)
coerce_to_dtype : bool
    boolean indicating whether to attempt to convert ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

coerce_to_dtype: boolean indicating whether to attempt to convert
the result to the underlying ExtensionArray dtype
(default True)

Returns
-------
A method that can be bound to a method of a class

Example
-------
Given an ExtensionArray subclass called MyClass, use

>>> __add__ = ExtensionScalarOpsMixin.create_method(operator.add)

in the class definition of MyClass to create the operator
for addition.

"""

op_name = ops._get_op_name(op, False)

def _binop(self, other):
def convert_values(parm):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor detail, but I find 'values' (or 'param') is clearer than 'parm'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

if isinstance(parm, ExtensionArray) or is_list_like(parm):
ovalues = parm
elif is_extension_array_dtype(parm):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Series is also 'list-like', so I think this elif will not be reached.
But given that you simply iterate through it (which will be the same for series or extensionarray), I think you can simply remove that part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ovalues = parm.values
else: # Assume its an object
ovalues = [parm] * len(self)
return ovalues
lvalues = convert_values(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed as the calling object will always be already an ExtensionArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed

rvalues = convert_values(other)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably do alignment as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to, for a few reasons:

  • The _binop method from here is a method on ExtensionArray (i.e., isinstance(self, ExtensionArray) == True
  • This method is called by dispatch_to_extension_ops in ops.py, which is called after alignment has already been done in _arith_method_SERIES and alignment is ensured in _comp_method_SERIES

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK, this is also tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Yes, this is tested because I am using the same tests that are used for general operators on Series, which tests things that are misaligned, etc.

See pandas/tests/extension/tests/decimal/test_decimal.py that leverages the TestSeriesOperators.test_operators method from pandas\tests\series\test_operators .

As an aside, doing it this way uncovered the issues with .get and .combine that I submitted.


# If the operator is not defined for the underlying objects,
# a TypeError should be raised
res = [op(a, b) for (a, b) in zip(lvalues, rvalues)]

if coerce_to_dtype:
try:
res = self._from_sequence(res)
except TypeError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this fails, I think we should still convert it to an array instead of keeping it as a list?

Or does that happen on another level?
(anyway, also for usability with the ExtensionArray itself, I think coercing it to an array makes more sense)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we convert to an array, then we could have a dtype problem. This allows the result to be of any type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we convert to an array, then we could have a dtype problem

It will be converted to an array anyhow, if not here, then at the level above when the res is passed to the series constructor. So personally I would already do the conversion here (we can use the same logic / code as what is done in the series constructor to coerce a passed list)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche But why repeat that logic? If we leave it as a list, then the Series constructor will do the inference on the dtype.


return res

name = '__{name}__'.format(name=op_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you defining create_method as anything other than NotImplementedError

this is way too opionated - if u want to define a function to do this and an author can use this ok
but it should not be the default

maybe if u want to create an ExtensionOpsMixinForScalars (long name though)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a closer look at the diff if you give such an opinionated (pun intended :-)) comment

  • this is not the default, it's an optional mixin extension authors can reuse (so what you suggest?)
  • so it is exactly the point to be opinionated and fallback to the scalars (we might try to think of a better name that reflects this, as you suggested)
  • it is what we discussed in the other PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche maybe I wasn't clear.

ExtensionOpsBase should simply have NotImplementedError for the actual implementation of the arithmethic and comaprison ops. Scalar implementations of this will IMHO rarely be used and simply cluttering things up. This can be put as a separate MixIn class if you wish for inclusion by an author, including it here is forcing a pretty bad idiom: coupling the interface and implementation, and is quite error prone because its not opt in. If an author extends and adds ops, they immediate get an implementation, which could very easiy be buggy / not work / work in unexpected ways.

I think the ops implementation is so important that this must be an explicit choice.

The point of the mixin is to avoid the boilerplate of writing __div__ - .... and so on. NOT the implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be put as a separate MixIn class if you wish for inclusion by an author,

Again, this is what this PR is doing.
To repeat from above, is might be it is just the name of ExtensionOpsBase which is not clear? (with which I agree)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback @jorisvandenbossche Maybe the naming is misleading. So let me clarify the intent of the PR.

The goal here is that if someone extends ExtensionArray (let's call it MyArray for sake of discussion) where they are storing objects (let's call them MyClass) that already have the arithmetic and/or comparison operators defined for the class MyClass, they can use these mixins to get those operators defined on the MyArray subclass by simply adding the mixin as one of the base classes of MyArray.

An example is Decimal. The operators are defined there, so the mixin gives you the operators on DecimalArray "for free". I also have my own application that extends ExtensionArray and it uses the same principal.

The use of these 3 classes (ExtensionOpsBase, ExtensionArithmeticMixin and ExtensionComparisonMixin) is optional for people who are writing their own subclass of ExtensionArray. The ExtensionOpsBase class is just a base class for the 2 mixin classes. It contains the create_method() method that allows one to create an individual operator, under the assumption that MyClass has the operator defined. Aside from the use of the two mixin classes that define all of the operators, ExtensionOpsBase.createmethod() can be used to define individual methods. So, for example, if the MyClass class only supports addition, you can choose to not use the mixin, and just do something like the following in MyArray to create the operator.

    __add__ = ExtensionOpsBase.create_method(operator.add)

IMHO, it seems that @jorisvandenbossche understands what I'm trying to do, but @jreback may not fully understand it.

If you think there is a better name for ExtensionOpsBase, I'm open to suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche I've tried a variety of implementations without the base class, and couldn't get it to work. I'll rename as you suggest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming looks ok to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next commit will use the renaming

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the naming conventions. Maybe the NotImplemented versions of the methods belong on the base ExtensionArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel The NotImplemented versions are there by default for any class, so I don't think we need to do anything. In other words, if you write a class MyClass that extends ExtensionArray and you don't define the methods for operators, they are by default NotImplemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also pass True to _get_op_name, and then that function does this formatting for you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

return set_function_name(_binop, name, cls)


class ExtensionScalarArithmeticMixin(ExtensionScalarOpsMixin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not clear to why this inherits ExtensionScalarOpsMixin. If I want only the ops definitions, and want to define my own create_method this seems a bit odd to do. Rather I would like to have a _create_method that just raises NotImplementedError, so things fail as soon as I inherit this base class.

the way this is written it is entirely too error prone. You drop in these classes, then it might work incorrectly! e.g. maybe I happen to have a scalar ops defined. This should be very explict. I dont' see ANY advantage of defaulting to a scalar ops schema here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I think I see what you are saying, so I have pushed a commit that I think does what you suggest. I've created infrastructure that allows a developer to define their own create_method. I've written up docs that explains how it works. The underlying implementation is a little complex, because there are lots of inner classes flying around, but now people can use the ExtensionScalarArithmeticMixin if they have the underlying objects with operators defined, or they can use ExtensionArithmeticMixin(MyMixin) to define all of the operators, and then the class MyMixin has the definition of create_method in it.

I hope this addresses your concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find this much more complicated than what you had before, and I am also not really sure if it will be that useful in practice (i.e. the usecase of wanting to implement your own _create_method but that works for all operators. Which actually was also possible with your previous approach by inheriting eg ExtensionScalarArithmeticMixin and overriding the _create_method method).
I would rather wait and see if there is need for supporting this usecase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' see ANY advantage of defaulting to a scalar ops schema here.

@jreback again, 1) it's not defaulting for the generic ExtensionArray, the scalar ops is only implemented for this optional mixin 2) this is entirely the point of this PR.
Yes, you may well never want to use such fallback to the scalars. But then you just don't use those mixins from this PR.

This PR is not intending to solve the general ops issue for ExtensionArrays, its purpose was to provide a easy solution for a very specific use case.

"""A mixin for defining the arithmetic operations on an ExtensionArray
class, where it assumed that the underlying objects have the operators
already defined.

Usage
------
If you have defined a subclass MyClass(ExtensionArray), then
use MyClass(ExtensionArray, ExtensionScalarArithmeticMixin) to
get the arithmetic operators
"""

__add__ = ExtensionScalarOpsMixin._create_method(operator.add)
__radd__ = ExtensionScalarOpsMixin._create_method(ops.radd)
__sub__ = ExtensionScalarOpsMixin._create_method(operator.sub)
__rsub__ = ExtensionScalarOpsMixin._create_method(ops.rsub)
__mul__ = ExtensionScalarOpsMixin._create_method(operator.mul)
__rmul__ = ExtensionScalarOpsMixin._create_method(ops.rmul)
__pow__ = ExtensionScalarOpsMixin._create_method(operator.pow)
__rpow__ = ExtensionScalarOpsMixin._create_method(ops.rpow)
__mod__ = ExtensionScalarOpsMixin._create_method(operator.mod)
__rmod__ = ExtensionScalarOpsMixin._create_method(ops.rmod)
__floordiv__ = ExtensionScalarOpsMixin._create_method(operator.floordiv)
__rfloordiv__ = ExtensionScalarOpsMixin._create_method(ops.rfloordiv)
__truediv__ = ExtensionScalarOpsMixin._create_method(operator.truediv)
__rtruediv__ = ExtensionScalarOpsMixin._create_method(ops.rtruediv)
if not PY3:
__div__ = ExtensionScalarOpsMixin._create_method(operator.div)
__rdiv__ = ExtensionScalarOpsMixin._create_method(ops.rdiv)

__divmod__ = ExtensionScalarOpsMixin._create_method(divmod)
__rdivmod__ = ExtensionScalarOpsMixin._create_method(ops.rdivmod)


class ExtensionScalarComparisonMixin(ExtensionScalarOpsMixin):
"""A mixin for defining the comparison operations on an ExtensionArray
class, where it assumed that the underlying objects have the operators
already defined.

Usage
------
If you have defined a subclass MyClass(ExtensionArray), then
use MyClass(ExtensionArray, ExtensionComparisonMixin) to
get the comparison operators
"""

__eq__ = ExtensionScalarOpsMixin._create_method(operator.eq, False)
__ne__ = ExtensionScalarOpsMixin._create_method(operator.ne, False)
__lt__ = ExtensionScalarOpsMixin._create_method(operator.lt, False)
__gt__ = ExtensionScalarOpsMixin._create_method(operator.gt, False)
__le__ = ExtensionScalarOpsMixin._create_method(operator.le, False)
__ge__ = ExtensionScalarOpsMixin._create_method(operator.ge, False)
10 changes: 7 additions & 3 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -2972,16 +2972,20 @@ def get_value(self, series, key):
# use this, e.g. DatetimeIndex
s = getattr(series, '_values', None)
if isinstance(s, (ExtensionArray, Index)) and is_scalar(key):
# GH 20825
# GH 20882, 21257
# Unify Index and ExtensionArray treatment
# First try to convert the key to a location
# If that fails, see if key is an integer, and
# If that fails, raise a KeyError if an integer
# index, otherwise, see if key is an integer, and
# try that
try:
iloc = self.get_loc(key)
return s[iloc]
except KeyError:
if is_integer(key):
if (len(self) > 0 and
self.inferred_type in ['integer', 'boolean']):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes this necessary? I'm not clear on how it relates to the Mixin classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this you can comment on the relevant other PR (#21260)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel the other PR (#21260) was needed to make the tests here work right.

raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls move all non-ops code to another PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same answer as above: this is needed for the test to pass untill the other PRs are merged (to say: it is already moved to separate PRs)

elif is_integer(key):
return s[key]

s = com._values_from_object(series)
Expand Down
21 changes: 21 additions & 0 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
is_bool_dtype,
is_list_like,
is_scalar,
is_extension_array_dtype,
_ensure_object)
from pandas.core.dtypes.cast import (
maybe_upcast_putmask, find_common_type,
Expand Down Expand Up @@ -990,6 +991,20 @@ def _construct_divmod_result(left, result, index, name, dtype):
)


def dispatch_to_extension_op(op, left, right):
"""
Assume that left is a Series backed by an ExtensionArray,
apply the operator defined by op.
"""

# This will raise TypeError if the op is not defined on the ExtensionArray
res_values = op(left.values, right)

res_name = get_op_result_name(left, right)
return left._constructor(res_values, index=left.index,
name=res_name)


def _arith_method_SERIES(cls, op, special):
"""
Wrapper function for Series arithmetic operations, to avoid
Expand Down Expand Up @@ -1058,6 +1073,9 @@ def wrapper(left, right):
raise TypeError("{typ} cannot perform the operation "
"{op}".format(typ=type(left).__name__, op=str_rep))

elif is_extension_array_dtype(left):
return dispatch_to_extension_op(op, left, right)

lvalues = left.values
rvalues = right
if isinstance(rvalues, ABCSeries):
Expand Down Expand Up @@ -1208,6 +1226,9 @@ def wrapper(self, other, axis=None):
return self._constructor(res_values, index=self.index,
name=res_name)

elif is_extension_array_dtype(self):
return dispatch_to_extension_op(op, self, other)

elif isinstance(other, ABCSeries):
# By this point we have checked that self._indexed_same(other)
res_values = na_op(self.values, other.values)
Expand Down
27 changes: 18 additions & 9 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2203,23 +2203,22 @@ def _binop(self, other, func, level=None, fill_value=None):
result.name = None
return result

def combine(self, other, func, fill_value=np.nan):
def combine(self, other, func, fill_value=None):
"""
Perform elementwise binary operation on two Series using given function
with optional fill value when an index is missing from one Series or
the other

Parameters
----------
other : Series or scalar value
func : function
Function that takes two scalars as inputs and return a scalar
fill_value : scalar value

The default specifies to use the appropriate NaN value for
the underlying dtype of the Series
Returns
-------
result : Series

Examples
--------
>>> s1 = Series([1, 2])
Expand All @@ -2228,26 +2227,36 @@ def combine(self, other, func, fill_value=np.nan):
0 0
1 2
dtype: int64

See Also
--------
Series.combine_first : Combine Series values, choosing the calling
Series's values first
"""
self_is_ext = is_extension_array_dtype(self.values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move all of this (combine stuff) to another PR on top of this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if fill_value is None:
fill_value = na_value_for_dtype(self.dtype, False)

if isinstance(other, Series):
new_index = self.index.union(other.index)
new_name = ops.get_op_result_name(self, other)
new_values = np.empty(len(new_index), dtype=self.dtype)
for i, idx in enumerate(new_index):
new_values = []
for idx in new_index:
lv = self.get(idx, fill_value)
rv = other.get(idx, fill_value)
with np.errstate(all='ignore'):
new_values[i] = func(lv, rv)
new_values.append(func(lv, rv))
else:
new_index = self.index
with np.errstate(all='ignore'):
new_values = func(self._values, other)
new_values = [func(lv, other) for lv in self._values]
new_name = self.name

if self_is_ext and not is_categorical_dtype(self.values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use shorthands like this

make this simpler, something like

if is_categorical_dtype(...):
    pass
elif is_extension_dtype(...):
   ....

much more readable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that in #21183

try:
new_values = self._values._from_sequence(new_values)
except TypeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you catching a TypeError?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is also for the other PR, but: because the result might not necessarily be an ExtensionArray.

This is of course a bit an unclear area of combine, but currently there is no restriction on the function that the dtype needs to be preserved in the result of the function.

pass

return self._constructor(new_values, index=new_index, name=new_name)

def combine_first(self, other):
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/extension/base/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ class TestMyDtype(BaseDtypeTests):
from .groupby import BaseGroupbyTests # noqa
from .interface import BaseInterfaceTests # noqa
from .methods import BaseMethodsTests # noqa
from .ops import BaseOpsTests # noqa
from .missing import BaseMissingTests # noqa
from .reshaping import BaseReshapingTests # noqa
from .setitem import BaseSetitemTests # noqa
2 changes: 1 addition & 1 deletion pandas/tests/extension/base/getitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ def test_get(self, data):
expected = s.iloc[[0, 1]]
self.assert_series_equal(result, expected)

assert s.get(-1) == s.iloc[-1]
assert s.get(-1) is None
assert s.get(s.index.max() + 1) is None

s = pd.Series(data[:6], index=list('abcdef'))
Expand Down
Loading