Description
With a DatetimeIndex I can specify the rolling window using an offset alias, but if I want to skip the first (incomplete) window, I would need to calculate the number of periods in the window. Rolling is hence using different units for the window and min_period functionality. I would like to be able to skip all periods in the first n windows.
Possible implementations could be:
min_periods='7D'
min_periods=window
skip_windows=n
skip_window=True.
n=1 is probably good enough for most use cases.
Code Sample, a copy-pastable example if possible
# Generate example dataframe:
idx = pd.date_range("2019-03-01", periods=10000, freq='5T')
df = pd.DataFrame(np.sin(np.arange(0,100,0.01)), index=idx)
# Plot data
plt.plot(df)
# Plot rolling mean with vertical offset for visual separation
plt.plot(df.rolling('7D').mean() + 0.2)
# Plot rolling mean with time offset equal to 1 window
periods = pd.to_timedelta('7D')//df.index.freq
plt.plot(df.rolling('7D', min_periods=periods).mean())
plt.show()
Problem description
'min_periods' accepts only integer values. A min_periods value less than the number of periods in the window is not representative as there are too few observations. The documentation is very confusing with respect to time series since the "offset" apparently does not refer to an offset alias: "For a window that is specified by an offset, min_periods will default to 1. Otherwise, min_periods will default to the size of the window."
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.1
pytest: 3.5.1
pip: 18.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None