Skip to content

BUG: reindex() and reindex_like() fill behavior is different in pandas 12.0 and 13.1? #6418

Closed
@meelmaar

Description

@meelmaar

I just came across an issue which caused me serious troubles since upgrading from pandas 12.0 to 13.1. It happens when using a fill method with a reindex() or reindex_like() method. Moreover, those method are not giving consistent results anymore! I have not tested how this issue or originates from changed .ffill() and similar method, but I see it propagates to resample(). Could not find any recent mentioning of the strange behavior and no hints in the docs or What's New section.

This is the problem I encounter using pandas 12.0 (with numpy 1.7.1, in both, 32bit Python 2.7.5 Python x,y and 64bit, WinPython-64bit-2.7.4.1; windows 7) and pandas 13.1 (D:\PortableApps\WinPython-64bit-2.7.6.2, numpy 1.8.0). Pandas 12.0 behavior is the same for the 32 bit and 64 bit versions, so this cannot explain the problem.

Code:

import pandas as pd
# Make low frequency timeseries:
i30 = index=pd.date_range('2002-02-02', periods=4, freq='30T')
s=pd.Series(np.arange(4.), index=i30)
s[2] = np.NaN 

# Upsample by factor 3 with reindex() and resample() methods:
i10 = pd.date_range(i30[0], i30[-1], freq='10T')
s10 = s.reindex(index=i10, method='bfill')
s10_2 = s.reindex(index=i10, method='bfill', limit=2)
r10 = s.resample('10Min', fill_method='bfill')
r10_2 = s.resample('10Min', fill_method='bfill', limit=2)

In pandas 12.0: s10 equals s10_2 equals r10 equals r10_2

s10
Out[60]: 
2002-02-02 00:00:00     0
2002-02-02 00:10:00     1
2002-02-02 00:20:00     1
2002-02-02 00:30:00     1
2002-02-02 00:40:00   NaN
2002-02-02 00:50:00   NaN
2002-02-02 01:00:00   NaN
2002-02-02 01:10:00     3
2002-02-02 01:20:00     3
2002-02-02 01:30:00     3
Freq: 10T, dtype: float64

In pandas 13.1: s10 does not equal s10_2; s10 has all NaN's filled

s10
Out[120]: 
2002-02-02 00:00:00    0
2002-02-02 00:10:00    1
2002-02-02 00:20:00    1
2002-02-02 00:30:00    1
2002-02-02 00:40:00    3
2002-02-02 00:50:00    3
2002-02-02 01:00:00    3
2002-02-02 01:10:00    3
2002-02-02 01:20:00    3
2002-02-02 01:30:00    3
Freq: 10T, dtype: float64

Same holds for resampled series r10
Conclusion: in pandas 13.1, all is filled if limit=None which breaks with the pandas 12.0 behavior. I think the 12.0 behavior is mre sensible; only fill the gaps created from upsampling.
This even more import for the reindex_like method because there the "limit" key cannot limit which gaps are filled in pandas 13.1:

s.reindex_like(s10, method='bfill', limit=2)
Out[121]: 
2002-02-02 00:00:00    0
2002-02-02 00:10:00    1
2002-02-02 00:20:00    1
2002-02-02 00:30:00    1
2002-02-02 00:40:00    3
2002-02-02 00:50:00    3
2002-02-02 01:00:00    3
2002-02-02 01:10:00    3
2002-02-02 01:20:00    3
2002-02-02 01:30:00    3
Freq: 10T, dtype: float64

Hope this is clear and I can be reproduced? I hope this can be fixed soon. But of course, if you can reproduce this behavior and it has indeed change from 12.0 to 13.1, this should be in the docs

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions