Skip to content

BUG: downsampling with last doesn't accept "min_count" keyword #37768

Closed
@DiSchi123

Description

@DiSchi123

I verified this on v 1.1.3. I am on miniconda, 1.1.4 is not available yet. There are ways to install probably but if I read documentation right, this feature should be around since 0.22

import numpy as np
import pandas as pd

index = pd.date_range(start='2020', freq='M', periods=6)
data = np.ones(6)
data[4:6] = np.nan
datetime = pd.Series(data, index)
period = datetime.to_period()
datetime
2020-01-31    1.0
2020-02-29    1.0
2020-03-31    1.0
2020-04-30    1.0
2020-05-31    NaN
2020-06-30    NaN
Freq: M, dtype: float64
datetime.resample('Q').sum(min_count=2)
2020-03-31    3.0
2020-06-30    NaN
Freq: Q-DEC, dtype: float64
datetime.resample('Q').last(min_count=3)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-bd5dfd934676> in <module>
----> 1 datetime.resample('Q').last(min_count=3)

~\miniconda3\lib\site-packages\pandas\core\resample.py in g(self, _method, *args, **kwargs)
    933 
    934     def g(self, _method=method, *args, **kwargs):
--> 935         nv.validate_resampler_func(_method, args, kwargs)
    936         return self._downsample(_method)
    937 

~\miniconda3\lib\site-packages\pandas\compat\numpy\function.py in validate_resampler_func(method, args, kwargs)
    395             )
    396         else:
--> 397             raise TypeError("too many arguments passed in")
    398 
    399 

TypeError: too many arguments passed in

Problem description

According to the documentation, .last() does accept the keyword "min_count", just like for example .sum()
where it works fine, see above

So I should not see the error above. The "min_count" is useful also for .last() if you have nans in your data and want to avoid the record that is not truly the last record in the segment.

Doc:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.resample.Resampler.last.html#pandas.core.resample.Resampler.last)

Expected Output

2020-03-31    1.0
2020-06-30    NaN

Output of pd.show_versions()

See above - verified with pd 1.1.3. I started the issue on my laptop with older pandas but finished with 1.1.3. Error message and pd.show_versions() is up to date

INSTALLED VERSIONS

commit : db08276
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 8.1
Version : 6.3.9600
machine : AMD64
processor : Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.1.3
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.1.post20201107
Cython : None
pytest : None
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.49.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions