Skip to content

Fix for #12723: Unexpected behavior with binary operators and fill… #12791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

facaiy
Copy link
Contributor

@facaiy facaiy commented Apr 4, 2016

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

closes #12723

@jreback jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 4, 2016
if self.empty:
return self

new_data = self._data.eval(func=func, other=other,
raise_on_error=raise_on_error)
if fill_value is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to do the eval first, THEN, fill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that df2.add(2, fill_value=0) should act like df2.add(2).fillna(0), instead of df2.fillna(0).add(2)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the steps are align, arith then fill (with mask).

align is moot in this case. yes you are right, fill then arith.

In [7]: Series([np.nan,1,np.nan]).add(Series([2,3],index=[2,3]),fill_value=10)
Out[7]: 
0     NaN
1    11.0
2    12.0
3    13.0
dtype: float64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the API, DataFrame._combine_const, is it OK?
if yes, for consistency, it seems that we need to modify _combine_const method of sparse and series and their caller .

@sinhrks sinhrks changed the title Fix for #12723: Unexpected behavior with behavior with binary operators and fill… Fix for #12723: Unexpected behavior with binary operators and fill… Apr 4, 2016
new_data = self._data.eval(func=func, other=other,
raise_on_error=raise_on_error)
if fill_value is not None:
self_copy = self.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do

if fill_value is not None:
    self = self.fillna(fill_value)
data = self._data.eval(......)
....

we don't directly access .values like this as you can have multiple dtypes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jreback.

Since the _combine_const of all classes are designed without fill_value parameter, I worry whether modifying the API is a good idea. At least, Series and some member of sparse need do something to keep consistent. Could you give me some advice?

@facaiy facaiy force-pushed the fill_value_miss branch from 2e0ed3d to b0e4803 Compare April 8, 2016 06:42
@@ -1086,7 +1089,7 @@ def f(self, other, axis=default_axis, level=None, fill_value=None):
raise ValueError("Incompatible argument shape: %s" %
(other.shape, ))
else:
return self._combine_const(other, na_op)
return self._combine_const(other, na_op, fill_value=fill_value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about use fillna here, instead of modifying the _combine_const method? @jreback

@facaiy
Copy link
Contributor Author

facaiy commented Apr 8, 2016

If it is OK, I'd like to add a whatsnew and rebase.


def test_fill_value_missing_with_binary_operators(self):
# GH12723
dat = np.arange(10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify this as np.arange(10, dtype='int64') or the comparison will fail on windows

@jreback
Copy link
Contributor

jreback commented Apr 8, 2016

pls test for the sparse cases as well


exp = df2.fillna(0).add(df1)
res = df2.add(df1, fill_value=0)
tm.assert_sp_frame_equal(res, exp)
Copy link
Contributor Author

@facaiy facaiy Apr 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with sparse dataframe. And the sparse item and NaN are quite similar and confuse for me.

Traceback (most recent call last):
  File "/Users/facaiyan/Workshop/pandas/pandas/sparse/tests/test_sparse.py", line 966, in test_fill_value_missing_with_binary_operators
    tm.assert_sp_frame_equal(res, exp)
  File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 1257, in assert_sp_frame_equal
    assert_sp_series_equal(series, right[col])
  File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 1229, in assert_sp_series_equal
    assert_sp_array_equal(left.block.values, right.block.values)
  File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 1205, in assert_sp_array_equal
    assert_almost_equal(left.sp_values, right.sp_values)
  File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 124, in assert_almost_equal
    return _testing.assert_almost_equal(left, right, **kwargs)
  File "pandas/src/testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas/src/testing.c:3809)
  File "pandas/src/testing.pyx", line 120, in pandas._testing.assert_almost_equal (pandas/src/testing.c:2101)
  File "/Users/facaiyan/Workshop/pandas/pandas/util/testing.py", line 913, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: numpy array are different

numpy array shapes are different
[left]:  (10,)
[right]: (9,)

Copy link
Contributor Author

@facaiy facaiy Apr 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback if I misunderstand, please tell me.

for sparse dataframe:

  1. df.fillna(x): convert NaN entries of df to x.
  2. In df.add(a, fill_value=x): fill_value, convert sparse entries of df to x.

@facaiy facaiy force-pushed the fill_value_miss branch 3 times, most recently from d2bf6c7 to 9e0835b Compare April 16, 2016 12:33
@facaiy
Copy link
Contributor Author

facaiy commented Apr 16, 2016

The work is done if all tests are passed.
Thanks, everyone.

@facaiy facaiy force-pushed the fill_value_miss branch 2 times, most recently from 9a41b79 to 34b0922 Compare April 16, 2016 22:07
…_value (GH12723)

fix series

test for sparse dataframe

remove wrong test: fillna() is not equivalent of fill_value
@facaiy
Copy link
Contributor Author

facaiy commented Apr 17, 2016

If it is ok,
might we close #12723 ?

@jreback
Copy link
Contributor

jreback commented Apr 17, 2016

issues are closed when the PRs are merged - I'll have a look soon

@jreback jreback added this to the 0.18.1 milestone Apr 17, 2016
@jreback jreback closed this in 0734df0 Apr 17, 2016
@jreback
Copy link
Contributor

jreback commented Apr 17, 2016

thanks for the fix!

@facaiy facaiy deleted the fill_value_miss branch April 17, 2016 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected behavior with binary operators and fill_value
2 participants