-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Fix for #12723: Unexpected behavior with binary operators and fill… #12791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if self.empty: | ||
return self | ||
|
||
new_data = self._data.eval(func=func, other=other, | ||
raise_on_error=raise_on_error) | ||
if fill_value is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to do the eval first, THEN, fill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that df2.add(2, fill_value=0)
should act like df2.add(2).fillna(0)
, instead of df2.fillna(0).add(2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the steps are align
, arith
then fill
(with mask).
align
is moot in this case. yes you are right, fill then arith.
In [7]: Series([np.nan,1,np.nan]).add(Series([2,3],index=[2,3]),fill_value=10)
Out[7]:
0 NaN
1 11.0
2 12.0
3 13.0
dtype: float64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified the API, DataFrame._combine_const
, is it OK?
if yes, for consistency, it seems that we need to modify _combine_const
method of sparse and series and their caller .
new_data = self._data.eval(func=func, other=other, | ||
raise_on_error=raise_on_error) | ||
if fill_value is not None: | ||
self_copy = self.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just do
if fill_value is not None:
self = self.fillna(fill_value)
data = self._data.eval(......)
....
we don't directly access .values
like this as you can have multiple dtypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jreback.
Since the _combine_const
of all classes are designed without fill_value
parameter, I worry whether modifying the API is a good idea. At least, Series and some member of sparse need do something to keep consistent. Could you give me some advice?
@@ -1086,7 +1089,7 @@ def f(self, other, axis=default_axis, level=None, fill_value=None): | |||
raise ValueError("Incompatible argument shape: %s" % | |||
(other.shape, )) | |||
else: | |||
return self._combine_const(other, na_op) | |||
return self._combine_const(other, na_op, fill_value=fill_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about use fillna
here, instead of modifying the _combine_const
method? @jreback
If it is OK, I'd like to add a whatsnew and rebase. |
|
||
def test_fill_value_missing_with_binary_operators(self): | ||
# GH12723 | ||
dat = np.arange(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify this as np.arange(10, dtype='int64')
or the comparison will fail on windows
pls test for the sparse cases as well |
|
||
exp = df2.fillna(0).add(df1) | ||
res = df2.add(df1, fill_value=0) | ||
tm.assert_sp_frame_equal(res, exp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with sparse dataframe. And the sparse item and NaN are quite similar and confuse for me.
Traceback (most recent call last):
File "/Users/facaiyan/Workshop/pandas/pandas/sparse/tests/test_sparse.py", line 966, in test_fill_value_missing_with_binary_operators
tm.assert_sp_frame_equal(res, exp)
File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 1257, in assert_sp_frame_equal
assert_sp_series_equal(series, right[col])
File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 1229, in assert_sp_series_equal
assert_sp_array_equal(left.block.values, right.block.values)
File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 1205, in assert_sp_array_equal
assert_almost_equal(left.sp_values, right.sp_values)
File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 124, in assert_almost_equal
return _testing.assert_almost_equal(left, right, **kwargs)
File "pandas/src/testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas/src/testing.c:3809)
File "pandas/src/testing.pyx", line 120, in pandas._testing.assert_almost_equal (pandas/src/testing.c:2101)
File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 913, in raise_assert_detail
raise AssertionError(msg)
AssertionError: numpy array are different
numpy array shapes are different
[left]: (10,)
[right]: (9,)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback if I misunderstand, please tell me.
for sparse dataframe:
df.fillna(x)
: convert NaN entries ofdf
tox
.- In
df.add(a, fill_value=x)
:fill_value
, convert sparse entries ofdf
tox
.
d2bf6c7
to
9e0835b
Compare
The work is done if all tests are passed. |
9a41b79
to
34b0922
Compare
…_value (GH12723) fix series test for sparse dataframe remove wrong test: fillna() is not equivalent of fill_value
If it is ok, |
issues are closed when the PRs are merged - I'll have a look soon |
thanks for the fix! |
git diff upstream/master | flake8 --diff
closes #12723