Skip to content

DOC: update pandas.core.resample.Resampler.nearest docstring #20381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 20, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 125 additions & 6 deletions pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -498,23 +498,142 @@ def pad(self, limit=None):

def nearest(self, limit=None):
"""
Fill values with nearest neighbor starting from center
Fill the new missing values with their nearest neighbor value, based
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fit on a single line. Can you try rephrasing?

on index.

When resampling data, missing values may appear (e.g., when the
resampling frequency is higher than the original frequency).
The nearest fill will replace ``NaN`` values that appeared in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"nearest fill" refers to this method, right? Maybe make that explicit with "nearest fill method"?

the resampled data with the value from the nearest member of the
sequence, based on the index value.
Missing values that existed in the original data will not be modified.
If `limit` is given, fill only `limit` values in each direction for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the second "limit" could be "this many"? Probably worthwhile to get a non-native speaker to weight in what is clearest.

each of the original values.

Parameters
----------
limit : integer, optional
limit of how many values to fill
Limit of how many values to fill.

.. versionadded:: 0.21.0

Returns
-------
an upsampled Series
Series, DataFrame
An upsampled Series or DataFrame with ``NaN`` values filled with
their closest neighbor value.

See Also
--------
Series.fillna
DataFrame.fillna
backfill: Backward fill the new missing values in the resampled data.
fillna : Fill ``NaN`` values using the specified method, which can be
'backfill'.
pad : Forward fill ``NaN`` values.
pandas.Series.fillna : Fill ``NaN`` values in the Series using the
specified method, which can be 'backfill'.
pandas.DataFrame.fillna : Fill ``NaN`` values in the DataFrame using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thoroughness is good, but this seems excessive. @datapythonista what's the convention for how much to write in this section?

the specified method, which can be 'backfill'.

Examples
--------

Resampling a Series:

>>> s = pd.Series([1, 2, 3],
... index=pd.date_range('20180101', periods=3,
... freq='1h'))
>>> s
2018-01-01 00:00:00 1
2018-01-01 01:00:00 2
2018-01-01 02:00:00 3
Freq: H, dtype: int64

>>> s.resample('20min').nearest()
2018-01-01 00:00:00 1
2018-01-01 00:20:00 1
2018-01-01 00:40:00 2
2018-01-01 01:00:00 2
2018-01-01 01:20:00 2
2018-01-01 01:40:00 3
2018-01-01 02:00:00 3
Freq: 20T, dtype: int64

Resample in the middle:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "in the middle?"


>>> s.resample('30min').nearest()
2018-01-01 00:00:00 1
2018-01-01 00:30:00 2
2018-01-01 01:00:00 2
2018-01-01 01:30:00 3
2018-01-01 02:00:00 3
Freq: 30T, dtype: int64

Limited fill:

>>> s.resample('10min').nearest(limit=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tad long. Can you change it to '20min'. I think that'll still make the point as the first will be filled and the second will be NaN

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

2018-01-01 00:00:00 1.0
2018-01-01 00:10:00 1.0
2018-01-01 00:20:00 NaN
2018-01-01 00:30:00 NaN
2018-01-01 00:40:00 NaN
2018-01-01 00:50:00 2.0
2018-01-01 01:00:00 2.0
2018-01-01 01:10:00 2.0
2018-01-01 01:20:00 NaN
2018-01-01 01:30:00 NaN
2018-01-01 01:40:00 NaN
2018-01-01 01:50:00 3.0
2018-01-01 02:00:00 3.0
Freq: 10T, dtype: float64

Resampling a DataFrame that has missing values:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add as small explanation about that the initial NaN is preserved


>>> df = pd.DataFrame({'a': [2, np.nan, 6], 'b': [1, 3, 5]},
... index=pd.date_range('20180101', periods=3,
... freq='h'))
>>> df
a b
2018-01-01 00:00:00 2.0 1
2018-01-01 01:00:00 NaN 3
2018-01-01 02:00:00 6.0 5

>>> df.resample('20min').nearest()
a b
2018-01-01 00:00:00 2.0 1
2018-01-01 00:20:00 2.0 1
2018-01-01 00:40:00 NaN 3
2018-01-01 01:00:00 NaN 3
2018-01-01 01:20:00 NaN 3
2018-01-01 01:40:00 6.0 5
2018-01-01 02:00:00 6.0 5

Resampling a DataFrame with shuffled indexes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully sure this example is needed, as it is something general to resample, and not specific to the nearest method


>>> df = pd.DataFrame({'a': [2, 6, 4]},
... index=pd.date_range('20180101', periods=3,
... freq='h'))
>>> df
a
2018-01-01 00:00:00 2
2018-01-01 01:00:00 6
2018-01-01 02:00:00 4

>>> sorted_df = df.sort_values(by=['a'])
>>> sorted_df
a
2018-01-01 00:00:00 2
2018-01-01 02:00:00 4
2018-01-01 01:00:00 6

>>> sorted_df.resample('20min').nearest()
a
2018-01-01 00:00:00 2
2018-01-01 00:20:00 2
2018-01-01 00:40:00 6
2018-01-01 01:00:00 6
2018-01-01 01:20:00 6
2018-01-01 01:40:00 4
2018-01-01 02:00:00 4
"""
return self._upsample('nearest', limit=limit)

Expand All @@ -527,7 +646,7 @@ def backfill(self, limit=None):
appear (e.g., when the resampling frequency is higher than the original
frequency). The backward fill will replace NaN values that appeared in
the resampled data with the next value in the original sequence.
Missing values that existed in the orginal data will not be modified.
Missing values that existed in the original data will not be modified.

Parameters
----------
Expand Down