Skip to content

BUG: Use of PeriodIndex with rolling transform aggregating function into cumulative aggregating? #34225

Closed
@yohplala

Description

@yohplala
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.


Code Sample, a copy-pastable example

import pandas as pd
start = '2020-01-01 08:00'
end = '2020-01-01 12:00'
intervals = pd.period_range(start=start, end=end, freq = '30T')
values = [i for i in range(0, len(intervals))]
ser = pd.Series(values, index = intervals)
offset = pd.tseries.frequencies.to_offset('1h')

test_min = ser.rolling(window = offset, closed='left').min()
test_sum = ser.rolling(window = offset, closed='left').sum()
test_max = ser.rolling(window = offset, closed='left').max()

Output

# Problem 1
test_min
2020-01-01 08:00    NaN
2020-01-01 08:30    0.0
2020-01-01 09:00    0.0
2020-01-01 09:30    0.0
2020-01-01 10:00    0.0
2020-01-01 10:30    0.0
2020-01-01 11:00    0.0
2020-01-01 11:30    0.0
2020-01-01 12:00    0.0
Freq: 30T, dtype: float64
# Problem 1
test_sum
2020-01-01 08:00    NaN
2020-01-01 08:30    0.0
2020-01-01 09:00    1.0
2020-01-01 09:30    3.0
2020-01-01 10:00    6.0
2020-01-01 10:30   10.0
2020-01-01 11:00   15.0
2020-01-01 11:30   21.0
2020-01-01 12:00   28.0
Freq: 30T, dtype: float64
# Problem 2
test_max
2020-01-01 08:00    NaN
2020-01-01 08:30    0.0
2020-01-01 09:00    1.0
2020-01-01 09:30    2.0
2020-01-01 10:00    3.0
2020-01-01 10:30    4.0
2020-01-01 11:00    5.0
2020-01-01 11:30    6.0
2020-01-01 12:00    7.0
Freq: 30T, dtype: float64

Problem descriptions

There are 2 problems:

  • problem 1:
    • min value on a 1H window is not 0 given the input provided (see expected output)
    • 2nd example with sum helps understand what happens with min: rolling is actually operating cumulated sum and cumulated min?!
  • problem 2: the 0 at the second row in test_max shows that rolling with closed = 'left' is not able to handle PeriodIndex correctly. It seems it does not know that periods are themselves bins with a specific close parameter to be considered.

Expected Output

# Problem 1
test_min
2020-01-01 08:00    NaN
2020-01-01 08:30    0.0
2020-01-01 09:00    1.0
2020-01-01 09:30    2.0
2020-01-01 10:00    3.0
2020-01-01 10:30    4.0
2020-01-01 11:00    5.0
2020-01-01 11:30    6.0
2020-01-01 12:00    7.0
Freq: 30T, dtype: float64
# Problem 1
test_sum
2020-01-01 08:00    NaN
2020-01-01 08:30    1.0
2020-01-01 09:00    3.0
2020-01-01 09:30    5.0
2020-01-01 10:00    7.0
2020-01-01 10:30    9.0
2020-01-01 11:00   11.0
2020-01-01 11:30   13.0
2020-01-01 12:00   15.0
Freq: 30T, dtype: float64
# Problem 2
test_max
2020-01-01 08:00    NaN
2020-01-01 08:30    1.0
2020-01-01 09:00    2.0
2020-01-01 09:30    3.0
2020-01-01 10:00    4.0
2020-01-01 10:30    5.0
2020-01-01 11:00    6.0
2020-01-01 11:30    7.0
2020-01-01 12:00    8.0
Freq: 30T, dtype: float64
#### Output of ``pd.show_versions()``

<details>

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.6.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.3.0-51-generic
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : fr_FR.UTF-8
LOCALE           : fr_FR.UTF-8
pandas           : 1.0.3
numpy            : 1.16.3
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.0.2
setuptools       : 46.2.0.post20200511
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : 3.0.3
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.13.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : 0.3.3
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.0.3
numexpr          : 2.7.1
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 0.16.0
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : 3.6.1
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : 0.48.0

</details>

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions