-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: interpolate.limit_area() 16284 #16513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
9852ec4
4bacc45
80d67b7
d83246c
b24e488
61e808f
41af8e3
7c53e78
e91cf4f
596f145
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -330,6 +330,10 @@ Interpolation | |
|
||
The ``limit_direction`` keyword argument was added. | ||
|
||
.. versionadded:: 0.21.0 | ||
|
||
The ``limit_area`` keyword argument was added. | ||
|
||
Both Series and Dataframe objects have an ``interpolate`` method that, by default, | ||
performs linear interpolation at missing datapoints. | ||
|
||
|
@@ -458,29 +462,48 @@ Interpolation Limits | |
^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Like other pandas fill methods, ``interpolate`` accepts a ``limit`` keyword | ||
argument. Use this argument to limit the number of consecutive interpolations, | ||
keeping ``NaN`` values for interpolations that are too far from the last valid | ||
observation: | ||
argument. Use this argument to limit the number of consecutive ``NaN`` values | ||
filled since the last valid observation: | ||
|
||
.. ipython:: python | ||
|
||
ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan, np.nan, 13]) | ||
ser.interpolate(limit=2) | ||
ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan, np.nan, 13, np.nan, np.nan]) | ||
|
||
By default, ``limit`` applies in a forward direction, so that only ``NaN`` | ||
values after a non-``NaN`` value can be filled. If you provide ``'backward'`` or | ||
``'both'`` for the ``limit_direction`` keyword argument, you can fill ``NaN`` | ||
values before non-``NaN`` values, or both before and after non-``NaN`` values, | ||
respectively: | ||
# fill all consecutive values in a forward direction | ||
ser.interpolate() | ||
|
||
.. ipython:: python | ||
# fill one consecutive value in a forward direction | ||
ser.interpolate(limit=1) | ||
|
||
ser.interpolate(limit=1) # limit_direction == 'forward' | ||
By default, ``NaN`` values are filled in a ``forward`` direction. Use | ||
``limit_direction`` parameter to fill ``backward`` or from ``both`` directions. | ||
|
||
.. ipython:: python | ||
|
||
# fill one consecutive value backwards | ||
ser.interpolate(limit=1, limit_direction='backward') | ||
|
||
# fill one consecutive value in both directions | ||
ser.interpolate(limit=1, limit_direction='both') | ||
|
||
# fill all consecutive values in both directions | ||
ser.interpolate(limit_direction='both') | ||
|
||
By default, ``NaN`` values are filled whether they are inside (surrounded by) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. need to update this |
||
existing valid values, or outside existing valid values. Introduced in v0.21 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Introduced in v0.21" -> "Introduced in pandas 0.21, " There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
the ``limit_area`` parameter restricts filling to either inside or outside values. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe add some working about interpolation vs extrapolation here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe also when you would want to use / do this. |
||
|
||
.. ipython:: python | ||
|
||
# fill one consecutive inside value in both directions | ||
ser.interpolate(limit=1, limit_area='inside', limit_direction='both') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you put limit_area here also after limit_direction (to have it consistent with the other examples)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
# fill all consecutive outside values backward | ||
ser.interpolate(limit_direction='backward', limit_area='outside') | ||
|
||
# fill all consecutive outside values in both directions | ||
ser.interpolate(limit_direction='both', limit_area='outside') | ||
|
||
.. _missing_data.replace: | ||
|
||
Replacing Generic Values | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,7 @@ New features | |
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`) | ||
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`, | ||
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`) | ||
- Added `limit_area` parameter to `DataFrame.interpolate()` method allowing further control of which NaNs are replaced (:issue:`16284`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. show a small sub-section example here of why this parameter is useful (take the examples from the docs you wrote above). and provide a pointer to the docs (again which you wrote) |
||
|
||
.. _whatsnew_0210.enhancements.other: | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3883,10 +3883,13 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None, | |
limit : int, default None. | ||
Maximum number of consecutive NaNs to fill. Must be greater than 0. | ||
limit_direction : {'forward', 'backward', 'both'}, default 'forward' | ||
If limit is specified, consecutive NaNs will be filled in this | ||
direction. | ||
|
||
Consecutive NaNs will be filled in this direction. | ||
.. versionadded:: 0.17.0 | ||
limit_area : {'inside', 'outside'}, default None | ||
* 'inside' Only fill NaNs surrounded by valid values (interpolate). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would put a colon ( |
||
* 'outside' Only fill NaNs outside valid values (extrapolate). | ||
* None: default fill inside and outside | ||
.. versionadded:: 0.21.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. put the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback I also noticed and corrected the old .. versionadded tag on 3887 which was not being property replaced. It needed the blank lines to stop it from being combined with the normal paragraph above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you put a blank line above this one |
||
|
||
inplace : bool, default False | ||
Update the NDFrame in place if possible. | ||
|
@@ -3919,7 +3922,8 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None, | |
|
||
@Appender(_shared_docs['interpolate'] % _shared_doc_kwargs) | ||
def interpolate(self, method='linear', axis=0, limit=None, inplace=False, | ||
limit_direction='forward', downcast=None, **kwargs): | ||
limit_direction='forward', limit_area=None, | ||
downcast=None, **kwargs): | ||
""" | ||
Interpolate values according to different methods. | ||
""" | ||
|
@@ -3968,6 +3972,7 @@ def interpolate(self, method='linear', axis=0, limit=None, inplace=False, | |
new_data = data.interpolate(method=method, axis=ax, index=index, | ||
values=_maybe_transposed_self, limit=limit, | ||
limit_direction=limit_direction, | ||
limit_area=limit_area, | ||
inplace=inplace, downcast=downcast, | ||
**kwargs) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -111,7 +111,7 @@ def clean_interp_method(method, **kwargs): | |
|
||
|
||
def interpolate_1d(xvalues, yvalues, method='linear', limit=None, | ||
limit_direction='forward', fill_value=None, | ||
limit_direction='forward', limit_area=None, fill_value=None, | ||
bounds_error=False, order=None, **kwargs): | ||
""" | ||
Logic for the 1-d interpolation. The result should be 1-d, inputs | ||
|
@@ -155,28 +155,12 @@ def _interp_limit(invalid, fw_limit, bw_limit): | |
raise ValueError('Invalid limit_direction: expecting one of %r, got ' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you use |
||
'%r.' % (valid_limit_directions, limit_direction)) | ||
|
||
from pandas import Series | ||
ys = Series(yvalues) | ||
start_nans = set(range(ys.first_valid_index())) | ||
end_nans = set(range(1 + ys.last_valid_index(), len(valid))) | ||
|
||
# violate_limit is a list of the indexes in the series whose yvalue is | ||
# currently NaN, and should still be NaN after the interpolation. | ||
# Specifically: | ||
# | ||
# If limit_direction='forward' or None then the list will contain NaNs at | ||
# the beginning of the series, and NaNs that are more than 'limit' away | ||
# from the prior non-NaN. | ||
# | ||
# If limit_direction='backward' then the list will contain NaNs at | ||
# the end of the series, and NaNs that are more than 'limit' away | ||
# from the subsequent non-NaN. | ||
# | ||
# If limit_direction='both' then the list will contain NaNs that | ||
# are more than 'limit' away from any non-NaN. | ||
# | ||
# If limit=None, then use default behavior of filling an unlimited number | ||
# of NaNs in the direction specified by limit_direction | ||
if limit_area is not None: | ||
valid_limit_areas = ['inside', 'outside'] | ||
limit_area = limit_area.lower() | ||
if limit_area not in valid_limit_areas: | ||
raise ValueError('Invalid limit_area: expecting one of %r, got %r.' | ||
% (valid_limit_areas, limit_area)) | ||
|
||
# default limit is unlimited GH #16282 | ||
if limit is None: | ||
|
@@ -186,15 +170,43 @@ def _interp_limit(invalid, fw_limit, bw_limit): | |
elif limit < 1: | ||
raise ValueError('Limit must be greater than 0') | ||
|
||
# each possible limit_direction | ||
from pandas import Series | ||
ys = Series(yvalues) | ||
|
||
# These are sets of index pointers to invalid values... i.e. {0, 1, etc... | ||
all_nans = set(np.flatnonzero(invalid)) | ||
start_nans = set(range(ys.first_valid_index())) | ||
end_nans = set(range(1 + ys.last_valid_index(), len(valid))) | ||
mid_nans = all_nans - start_nans - end_nans | ||
|
||
# Like the sets above, preserve_nans contains indices of invalid values, | ||
# but in this case, it is the final set of indices that need to be | ||
# preserved as NaN after the interpolation. | ||
|
||
# For example if limit_direction='forward' then preserve_nans will | ||
# contain indices of NaNs at the beginning of the series, and NaNs that | ||
# are more than'limit' away from the prior non-NaN. | ||
|
||
# set preserve_nans based on direction using _interp_limit | ||
if limit_direction == 'forward': | ||
violate_limit = sorted(start_nans | | ||
set(_interp_limit(invalid, limit, 0))) | ||
preserve_nans = start_nans | set(_interp_limit(invalid, limit, 0)) | ||
elif limit_direction == 'backward': | ||
violate_limit = sorted(end_nans | | ||
set(_interp_limit(invalid, 0, limit))) | ||
elif limit_direction == 'both': | ||
violate_limit = sorted(_interp_limit(invalid, limit, limit)) | ||
preserve_nans = end_nans | set(_interp_limit(invalid, 0, limit)) | ||
else: | ||
# both directions... just use _interp_limit | ||
preserve_nans = set(_interp_limit(invalid, limit, limit)) | ||
|
||
# if limit_area is set, add either mid or outside indices | ||
# to preserve_nans GH #16284 | ||
if limit_area == 'inside': | ||
# preserve NaNs on the outside | ||
preserve_nans |= start_nans | end_nans | ||
elif limit_area == 'outside': | ||
# preserve NaNs on the inside | ||
preserve_nans |= mid_nans | ||
|
||
# sort preserve_nans and covert to list | ||
preserve_nans = sorted(preserve_nans) | ||
|
||
xvalues = getattr(xvalues, 'values', xvalues) | ||
yvalues = getattr(yvalues, 'values', yvalues) | ||
|
@@ -211,7 +223,7 @@ def _interp_limit(invalid, fw_limit, bw_limit): | |
else: | ||
inds = xvalues | ||
result[invalid] = np.interp(inds[invalid], inds[valid], yvalues[valid]) | ||
result[violate_limit] = np.nan | ||
result[preserve_nans] = np.nan | ||
return result | ||
|
||
sp_methods = ['nearest', 'zero', 'slinear', 'quadratic', 'cubic', | ||
|
@@ -230,7 +242,7 @@ def _interp_limit(invalid, fw_limit, bw_limit): | |
fill_value=fill_value, | ||
bounds_error=bounds_error, | ||
order=order, **kwargs) | ||
result[violate_limit] = np.nan | ||
result[preserve_nans] = np.nan | ||
return result | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -959,6 +959,45 @@ def test_interp_limit_bad_direction(self): | |
pytest.raises(ValueError, s.interpolate, method='linear', | ||
limit_direction='abc') | ||
|
||
# limit_area introduced GH #16284 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you put the comment inside the function |
||
def test_interp_limit_area(self): | ||
# These tests are for issue #9218 -- fill NaNs in both directions. | ||
s = Series([nan, nan, 3, nan, nan, nan, 7, nan, nan]) | ||
|
||
expected = Series([nan, nan, 3., 4., 5., 6., 7., nan, nan]) | ||
result = s.interpolate(method='linear', limit_area='inside') | ||
assert_series_equal(result, expected) | ||
|
||
expected = Series([nan, nan, 3., 4., nan, nan, 7., nan, nan]) | ||
result = s.interpolate(method='linear', limit_area='inside', | ||
limit=1) | ||
|
||
expected = Series([nan, nan, 3., 4., nan, 6., 7., nan, nan]) | ||
result = s.interpolate(method='linear', limit_area='inside', | ||
limit_direction='both', limit=1) | ||
assert_series_equal(result, expected) | ||
|
||
expected = Series([nan, nan, 3., nan, nan, nan, 7., 7., 7.]) | ||
result = s.interpolate(method='linear', limit_area='outside') | ||
assert_series_equal(result, expected) | ||
|
||
expected = Series([nan, nan, 3., nan, nan, nan, 7., 7., nan]) | ||
result = s.interpolate(method='linear', limit_area='outside', | ||
limit=1) | ||
|
||
expected = Series([nan, 3., 3., nan, nan, nan, 7., 7., nan]) | ||
result = s.interpolate(method='linear', limit_area='outside', | ||
limit_direction='both', limit=1) | ||
assert_series_equal(result, expected) | ||
|
||
expected = Series([3., 3., 3., nan, nan, nan, 7., nan, nan]) | ||
result = s.interpolate(method='linear', limit_area='outside', | ||
direction='backward') | ||
|
||
# raises an error even if limit type is wrong. | ||
pytest.raises(ValueError, s.interpolate, method='linear', | ||
limit_area='abc') | ||
|
||
def test_interp_limit_direction(self): | ||
# These tests are for issue #9218 -- fill NaNs in both directions. | ||
s = Series([1, 3, np.nan, np.nan, np.nan, 11]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't make sense w/o an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jeff,
The examples for both limit_direction and limit_area are below in the "interpolation limits" sub-section.
I'm mostly trying to get the correct style from inference, so I basically reproduced what had been done in the past for limit_direction.
There is a location (.. _missing_data.interp_limits:) below these versionadded references to which both limit_direction and limit_area can be linked if that is the right style.
Honestly, since version added is part of the docstrings, I'm not sure it needs to be reproduced here at all, but again, that is a bigger style question above my pay grade. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A link to below sounds good. You can make a new one specifically for
_missing_data.interp_limit_area
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, I would just remove it here.