Skip to content

BUG: Rolling.count modifies 'min_periods' inplace since 1.2.0 #39554

Closed
@dchigarev

Description

@dchigarev
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import numpy as np
import pandas
from pandas.testing import assert_frame_equal

data = {
    "col1": [np.nan, 1, 2]
}

pd_df = pandas.DataFrame(data)
pd_rolled = pd_df.rolling(window=2, min_periods=None)

res1 = pd_rolled.sum()
pd_rolled.count()
res2 = pd_rolled.sum()

assert_frame_equal(res1, res2) # AssertionError
Output
AssertionError: DataFrame.iloc[:, 0] (column name="col1") are different

DataFrame.iloc[:, 0] (column name="col1") values are different (66.66667 %)
[index]: [0, 1, 2]
[left]:  [nan, nan, 3.0]
[right]: [0.0, 1.0, 3.0]

Problem description

Two sequential calls of .sum on the rolling object produces different results if we call .count between them and min_periods=None. The default behavior of Rolling.sum if min_periods is None is to consider min_periods to be equal to the window size. Currently, Rolling.count behaves differently, and considers min_periods to be 0 if it is None. #36649 brought a warning that this behavior is deprecated and also refactored .count implementation. Right after giving a warning, it modifies the original value of min_periods of the rolling object, so the future calls of .sum and other operations give incorrect results.

def count(self):
if self.min_periods is None:
warnings.warn(
(
"min_periods=None will default to the size of window "
"consistent with other methods in a future version. "
"Specify min_periods=0 instead."
),
FutureWarning,
)
self.min_periods = 0
return super().count()

Expected Output

Rolling.count should not modify min_periods attribute of the rolling object, or if it is, revert back the original value of min_periods after performing count

Output of pd.show_versions()


INSTALLED VERSIONS
------------------
commit           : 9d598a5e1eee26df95b3910e3f2934890d062caa
python           : 3.7.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.15.0-50-generic
Version          : #54-Ubuntu SMP Mon May 6 18:46:08 UTC 2019
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.2.1
numpy            : 1.19.0
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 47.3.1.post20200622
Cython           : None
pytest           : 6.0.2
hypothesis       : None
sphinx           : None
blosc            : None
feather          : 0.4.1
xlsxwriter       : None
lxml.etree       : 4.5.1
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : None
pandas_datareader: None
bs4              : 4.9.1
bottleneck       : None
fsspec           : 0.7.4
fastparquet      : None
gcsfs            : None
matplotlib       : 3.2.2
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.4
pandas_gbq       : 0.13.2
pyarrow          : 1.0.1
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.5.1
sqlalchemy       : 1.3.18
tables           : 3.6.1
tabulate         : None
xarray           : 0.15.1
xlrd             : 1.2.0
xlwt             : None
numba            : None

Metadata

Metadata

Assignees

Labels

BugNeeds TriageIssue that has not been reviewed by a pandas team memberWindowrolling, ewma, expanding

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions