Closed
Description
At the moment some DataFrame
arithmetic/comparison operations have unintuitive or ad-hoc broadcasting behavior.
df = pd.DataFrame(np.arange(6).reshape(3, 2), columns=['A', 'B'])
- 1-D list and 1D np.array are treated differently, with the style of differentness dependent on shape:
>>> df == [1, 2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/core/ops.py", line 1815, in f
try_cast=False)
File "pandas/core/frame.py", line 4848, in _combine_const
try_cast=try_cast)
File "pandas/core/internals/managers.py", line 529, in eval
return self.apply('eval', **kwargs)
File "pandas/core/internals/managers.py", line 423, in apply
applied = getattr(b, f)(**kwargs)
File "pandas/core/internals/blocks.py", line 1437, in eval
'block values'.format(other=other))
ValueError: Invalid broadcasting comparison [[1, 2]] with block values
>>> df == np.array([1, 2])
A B
0 False False
1 False False
2 False False
>>> df == [1, 2, 3]
A B
0 False True
1 True False
2 False False
>>> df == np.array([1, 2, 3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/core/ops.py", line 1815, in f
try_cast=False)
File "pandas/core/frame.py", line 4848, in _combine_const
try_cast=try_cast)
File "pandas/core/internals/managers.py", line 529, in eval
return self.apply('eval', **kwargs)
File "pandas/core/internals/managers.py", line 423, in apply
applied = getattr(b, f)(**kwargs)
File "pandas/core/internals/blocks.py", line 1437, in eval
'block values'.format(other=other))
ValueError: Invalid broadcasting comparison [array([1, 2, 3])] with block values
I think the non-raising behavior is correct in both cases. If operating against a list/np.array/Index/EA and there is unique axis with the correct length, we should broadcast against the other axis.
- Operating against Series sharing
df.index
is counter-untuitive:
>>> df + df['A']
0 1 2 A B
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
Since df['A'].index
matches df.index
, I think it makes much more sense for this to add df['A']
to each column and return (@shoyer IIRC you've suggested this before):
A B
0 0 1
1 4 5
2 8 9
- Operations against 2D
np.arrays
do not behave like np.arrays (which I expected they would)
>>> df + df[['A']].values
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/core/ops.py", line 1734, in f
other = _align_method_FRAME(self, other, axis)
File "pandas/core/ops.py", line 1690, in _align_method_FRAME
given_shape=right.shape))
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (3, 1)
>>> df.values + df[['A']].values
array([[0, 1],
[4, 5],
[8, 9]])
>>> df + df.iloc[:1].values
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/core/ops.py", line 1734, in f
other = _align_method_FRAME(self, other, axis)
File "pandas/core/ops.py", line 1690, in _align_method_FRAME
given_shape=right.shape))
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (1, 2)
>>> df.values + df.iloc[:1].values
array([[0, 2],
[2, 4],
[4, 6]])
Metadata
Metadata
Assignees
Labels
No labels