Skip to content

BUG: Incorrect behavior of window aggregation functions on disjoint windows skipping overflowing elements #45647

Closed
@rtpsw

Description

@rtpsw

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd
from pandas.core.indexers.objects import BaseIndexer
f64max = 1.7976931348623158E+308
values = np.array([1, f64max, f64max, 3], dtype=np.float64)
start = np.array([0, 0, 3, 3], dtype=np.int64)
end = start + 1
class WinInd(BaseIndexer):
    def get_window_bounds(self, num_values: 'int' = 0, min_periods: 'int | None' = None, center: 'bool | None' = None, closed: 'str | None' = None) -> 'tuple[np.ndarray, np.ndarray, np.ndarray]':
        return start, end

pd.Series(values).rolling(WinInd()).sum()
pd.Series(values).rolling(WinInd()).mean()

Issue Description

The example outputs

0    1.0
1    1.0
2    NaN
3    NaN
dtype: float64

for both sum and mean. The correct behavior would output 4.0 instead of NaN in the last two rows. This shows that at least these rolling window aggregations produce incorrect output on disjoint windows that skip elements whose sum overflows. This incorrect behavior originates in the window aggregation functions in pandas._libs.window.aggregations that process, rather than skip over, elements outside the disjoint windows; many of these functions have this problem. Here is code that shows this for sum and mean:

>>> import numpy as np
>>> import pandas as pd
>>> import pandas._libs.window.aggregations as wa
>>> values = np.array([1, np.inf, 3], dtype=np.float64)
>>> start = np.array([0, 2], dtype=np.int64)
>>> end = start + 1
>>> wa.roll_sum(values, start, end, 0)
array([ 1., nan, nan])
>>> wa.roll_mean(values, start, end, 0)
array([ 1., nan, nan])

Because these window aggregation functions are not exposed to the user and are wrapped by defensive code within Series.rolling that handles np.inf values, the above example is more involved and induces an overflow to expose the behavior.

One issue where the need for handling disjoint windows occurs is in GH-15354 when the step size is larger than the window size, which is the use case described there. The current issue is a precursor for handling GH-15354.

Expected Behavior

The expected output is for both sum and mean is:

0    1.0
1    1.0
2    4.0
3    4.0
dtype: float64

Installed Versions

This issue is confirmed on the main branch.

Metadata

Metadata

Assignees

Labels

BugWindowrolling, ewma, expanding

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions