Skip to content

Change in behavior for rolling_var when win > len(arr) for 0.14: now raises error #7297

Closed
@kdiether

Description

@kdiether

In 0.13 I could pass a window length greater than the length of the Series passed to rolling_var (or, of course, rolling_std). In 0.14 that raises an error. Behavior is unchanged from 0.13 for other rolling functions:

data = """
x
0.1
0.5
0.3
0.2
0.7
"""

df = pd.read_csv(StringIO(data),header=True)

>>> pd.rolling_mean(df['x'],window=6,min_periods=2)

0      NaN
1    0.300
2    0.300
3    0.275
4    0.360
dtype: float64

>>> pd.rolling_skew(df['x'],window=6,min_periods=2)

0             NaN
1             NaN
2    3.903128e-15
3    7.528372e-01
4    6.013638e-01
dtype: float64

>>> pd.rolling_skew(df['x'],window=6,min_periods=6)

0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
dtype: float64

Those work, but not rolling_var:

>>> pd.rolling_var(df['x'],window=6,min_periods=2)

Traceback (most recent call last):
  File "./foo.py", line 187, in <module>
    print pd.rolling_var(df['x'],window=6,min_periods=2)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 594, in f
    center=center, how=how, **kwargs)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 346, in _rolling_moment
    result = calc(values)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 340, in <lambda>
    **kwds)
  File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 592, in call_cython
    return func(arg, window, minp, **kwds)
  File "algos.pyx", line 1177, in pandas.algos.roll_var (pandas/algos.c:28449)
IndexError: Out of bounds on buffer access (axis 0)

If this is the new desired default behavior for the rolling functions, I can work around it. I do like the behavior of rolling_skew and rolling_mean better. It was nice default behavior for me when I was doing rolling standard deviations for reasonably large financial data panels.

It looks to me like the issue is caused by the fact that the 0.14 algo for rolling variance is implemented such that the initial loop (roll_var (algos.pyx)) is the following:

for i from 0 <= i < win:

So it loops to win even when win > N.

It looks like to me that the other rolling functions try to implement their algos in such a way that the first loop counts over the following:

for i from 0 <= i < minp - 1:
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.10-200.fc20.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.0
nose: 1.3.1
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: 0.6.0.dev-b52bc09
IPython: 2.0.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: None
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.3
lxml: 3.3.5
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
Non

Karl D.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNumeric OperationsArithmetic, Comparison, and Logical operationsTestingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions