Description
In 0.13 I could pass a window length greater than the length of the Series
passed to rolling_var
(or, of course, rolling_std
). In 0.14 that raises an error. Behavior is unchanged from 0.13 for other rolling functions:
data = """
x
0.1
0.5
0.3
0.2
0.7
"""
df = pd.read_csv(StringIO(data),header=True)
>>> pd.rolling_mean(df['x'],window=6,min_periods=2)
0 NaN
1 0.300
2 0.300
3 0.275
4 0.360
dtype: float64
>>> pd.rolling_skew(df['x'],window=6,min_periods=2)
0 NaN
1 NaN
2 3.903128e-15
3 7.528372e-01
4 6.013638e-01
dtype: float64
>>> pd.rolling_skew(df['x'],window=6,min_periods=6)
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
Those work, but not rolling_var
:
>>> pd.rolling_var(df['x'],window=6,min_periods=2)
Traceback (most recent call last):
File "./foo.py", line 187, in <module>
print pd.rolling_var(df['x'],window=6,min_periods=2)
File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 594, in f
center=center, how=how, **kwargs)
File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 346, in _rolling_moment
result = calc(values)
File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 340, in <lambda>
**kwds)
File "/usr/lib64/python2.7/site-packages/pandas/stats/moments.py", line 592, in call_cython
return func(arg, window, minp, **kwds)
File "algos.pyx", line 1177, in pandas.algos.roll_var (pandas/algos.c:28449)
IndexError: Out of bounds on buffer access (axis 0)
If this is the new desired default behavior for the rolling functions, I can work around it. I do like the behavior of rolling_skew
and rolling_mean
better. It was nice default behavior for me when I was doing rolling standard deviations for reasonably large financial data panels.
It looks to me like the issue is caused by the fact that the 0.14 algo for rolling variance is implemented such that the initial loop (roll_var
(algos.pyx)) is the following:
for i from 0 <= i < win:
So it loops to win
even when win > N
.
It looks like to me that the other rolling functions try to implement their algos in such a way that the first loop counts over the following:
for i from 0 <= i < minp - 1:
>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.10-200.fc20.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.14.0
nose: 1.3.1
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: 0.6.0.dev-b52bc09
IPython: 2.0.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: None
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.3
lxml: 3.3.5
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
Non
Karl D.