Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
Code Sample, a copy-pastable example
import pandas as pd
import numpy as np
from random import seed, randint
# Data
kp = pd.period_range(start='2020-01-01 00:00', end='2020-01-01 00:25', freq='5T')
sp = pd.period_range(start='2020-01-01 00:00', end='2020-01-01 00:25', freq='1h')
seed(1)
values = [randint(0,10) for p in kp]
dft = pd.DataFrame({'Values' : values}, index=kp)
dft.loc[kp[-2]] = np.nan
# Trouble 1: `cummin`, `cummax`, `cumsum` not available through `agg`?
resampler = dft.resample(sp.freqstr)
progress = resampler.agg('cummin')
Get (same with cummax
and cumsum
):
AttributeError: 'cummin' is not a valid function for 'PeriodIndexResampler' object.
# Trouble 2: `cummin`, `cummax`, `cumsum` appear to work when used in a dict,
# but not skipna parameter. Whatever `skipna` (`True` or `False`) result is the same.
resampler = dft.resample(sp.freqstr)
progress = resampler.agg({('Values','cummin')},skipna=False)
Output obtained
progress
Values
Values
2020-01-01 00:00 2.0
2020-01-01 00:05 2.0
2020-01-01 00:10 1.0
2020-01-01 00:15 1.0
2020-01-01 00:20 NaN
2020-01-01 00:25 1.0
Problem description
It appears cumsum
, cummin
, cumsum
cannot be used directly with agg
. The documentation appears to state differently:
"Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply."
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.resample.Resampler.aggregate.html
Expected Output
For trouble 1 :
progress
(with cummin
)
Values
2020-01-01 00:00 2.0
2020-01-01 00:05 2.0
2020-01-01 00:10 1.0
2020-01-01 00:15 1.0
2020-01-01 00:20 NaN
2020-01-01 00:25 1.0
For trouble 2:
progress
(with cummin
& skipna=False
)
Values
2020-01-01 00:00 2.0
2020-01-01 00:05 2.0
2020-01-01 00:10 1.0
2020-01-01 00:15 1.0
2020-01-01 00:20 NaN
2020-01-01 00:25 NaN
As a side question, is it possible to have an additional parameter fill_value=0
to have then following output?
progress
(with cummin
& fill_value=0
)
Values
2020-01-01 00:00 2.0
2020-01-01 00:05 2.0
2020-01-01 00:10 1.0
2020-01-01 00:15 1.0
2020-01-01 00:20 1.0
2020-01-01 00:25 1.0
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-51-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8
pandas : 1.0.3
numpy : 1.16.3
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : None
pytest : None
hypothesis : None
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : 0.3.3
gcsfs : None
lxml.etree : None
matplotlib : 3.0.3
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.48.0