Skip to content

[DISC][BUG] DataFrame broadcasting behavior #22355

Closed
@jbrockmendel

Description

@jbrockmendel

At the moment some DataFrame arithmetic/comparison operations have unintuitive or ad-hoc broadcasting behavior.

df = pd.DataFrame(np.arange(6).reshape(3, 2), columns=['A', 'B'])
  • 1-D list and 1D np.array are treated differently, with the style of differentness dependent on shape:
>>> df == [1, 2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1815, in f
    try_cast=False)
  File "pandas/core/frame.py", line 4848, in _combine_const
    try_cast=try_cast)
  File "pandas/core/internals/managers.py", line 529, in eval
    return self.apply('eval', **kwargs)
  File "pandas/core/internals/managers.py", line 423, in apply
    applied = getattr(b, f)(**kwargs)
  File "pandas/core/internals/blocks.py", line 1437, in eval
    'block values'.format(other=other))
ValueError: Invalid broadcasting comparison [[1, 2]] with block values

>>> df == np.array([1, 2])
       A      B
0  False  False
1  False  False
2  False  False

>>> df == [1, 2, 3]
       A      B
0  False   True
1   True  False
2  False  False

>>> df == np.array([1, 2, 3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1815, in f
    try_cast=False)
  File "pandas/core/frame.py", line 4848, in _combine_const
    try_cast=try_cast)
  File "pandas/core/internals/managers.py", line 529, in eval
    return self.apply('eval', **kwargs)
  File "pandas/core/internals/managers.py", line 423, in apply
    applied = getattr(b, f)(**kwargs)
  File "pandas/core/internals/blocks.py", line 1437, in eval
    'block values'.format(other=other))
ValueError: Invalid broadcasting comparison [array([1, 2, 3])] with block values

I think the non-raising behavior is correct in both cases. If operating against a list/np.array/Index/EA and there is unique axis with the correct length, we should broadcast against the other axis.

  • Operating against Series sharing df.index is counter-untuitive:
>>> df + df['A']
    0   1   2   A   B
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN

Since df['A'].index matches df.index, I think it makes much more sense for this to add df['A'] to each column and return (@shoyer IIRC you've suggested this before):

   A  B
0  0  1
1  4  5
2  8  9
  • Operations against 2D np.arrays do not behave like np.arrays (which I expected they would)
>>> df + df[['A']].values
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1734, in f
    other = _align_method_FRAME(self, other, axis)
  File "pandas/core/ops.py", line 1690, in _align_method_FRAME
    given_shape=right.shape))
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (3, 1)

>>> df.values + df[['A']].values
array([[0, 1],
       [4, 5],
       [8, 9]])

>>> df + df.iloc[:1].values
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1734, in f
    other = _align_method_FRAME(self, other, axis)
  File "pandas/core/ops.py", line 1690, in _align_method_FRAME
    given_shape=right.shape))
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (1, 2)

>>> df.values + df.iloc[:1].values
array([[0, 2],
       [2, 4],
       [4, 6]])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions