Skip to content

ENH: DataFrame.interpolate limit to support all-or-none filling #42291

Open
@aaronsl-hku

Description

@aaronsl-hku

Currently with df.interpolate(limit, limit_direction) , I must choose 1 or both sides to fill when I am limiting the interpolation. What I find more useful is a all-or-none strategy rather than only fill up to the limit count, so I can fill up some short-term missing data and keep long-term missing data to be filtered after. Demonstrated as here:

>>> df = pd.DataFrame([[0,1,2,3],[1,np.nan,np.nan,np.nan],[np.nan,np.nan,np.nan,5],[3,4,5,6]],columns=list('abcd'))
>>> df
     a    b    c    d
0  0.0  1.0  2.0  3.0
1  1.0  NaN  NaN  NaN
2  NaN  NaN  NaN  5.0
3  3.0  4.0  5.0  6.0
>>> # Current options
>>> df.interpolate(axis=0,limit=1,limit_direction='forward')
     a    b    c    d
0  0.0  1.0  2.0  3.0
1  1.0  2.0  3.0  4.0
2  2.0  NaN  NaN  5.0
3  3.0  4.0  5.0  6.0
>>> df.interpolate(axis=0,limit=1,limit_direction='backward')
     a    b    c    d
0  0.0  1.0  2.0  3.0
1  1.0  NaN  NaN  4.0
2  2.0  3.0  4.0  5.0
3  3.0  4.0  5.0  6.0
>>> df.interpolate(axis=0,limit=1,limit_direction='both')
     a    b    c    d
0  0.0  1.0  2.0  3.0
1  1.0  2.0  3.0  4.0
2  2.0  3.0  4.0  5.0
3  3.0  4.0  5.0  6.0
>>> # What is desired
>>> interpolated_df = pd.DataFrame([[0,1,2,3],[1,np.nan,np.nan,4],[2,np.nan,np.nan,5],[3,4,5,6]],columns=list('abcd'))
>>> interpolated_df # NaNs at column b and c not filtered for exceeding limit 1
   a    b    c  d
0  0  1.0  2.0  3
1  1  NaN  NaN  4
2  2  NaN  NaN  5
3  3  4.0  5.0  6

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions