Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
import pandas as pd
s = pd.Series([1,2,3,pd.NA,5], dtype='Int64')
# Raises DataError: No numeric types to aggregate
s.rolling(3).max()
# count works
s.rolling(3).count()
Issue Description
When trying to run a rolling window calculation on an Int64Dtype series that contains NaNs, it seems that many of the available calculations raise the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/.pyenv/versions/3.8.2/envs/test-env/lib/python3.8/site-packages/pandas/core/window/rolling.py in _prep_values(self, values)
322 try:
--> 323 values = ensure_float64(values)
324 except (ValueError, TypeError) as err:
pandas/_libs/algos_common_helper.pxi in pandas._libs.algos.ensure_float64()
~/.pyenv/versions/3.8.2/envs/test-env/lib/python3.8/site-packages/pandas/core/arrays/masked.py in __array__(self, dtype)
334 """
--> 335 return self.to_numpy(dtype=dtype)
336
~/.pyenv/versions/3.8.2/envs/test-env/lib/python3.8/site-packages/pandas/core/arrays/masked.py in to_numpy(self, dtype, copy, na_value)
291 ):
--> 292 raise ValueError(
293 f"cannot convert to '{dtype}'-dtype NumPy array "
ValueError: cannot convert to 'float64'-dtype NumPy array with missing values. Specify an appropriate 'na_value' for this dtype.
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
~/.pyenv/versions/3.8.2/envs/test-env/lib/python3.8/site-packages/pandas/core/window/rolling.py in _apply_series(self, homogeneous_func, name)
403 try:
--> 404 values = self._prep_values(obj._values)
405 except (TypeError, NotImplementedError) as err:
~/.pyenv/versions/3.8.2/envs/test-env/lib/python3.8/site-packages/pandas/core/window/rolling.py in _prep_values(self, values)
324 except (ValueError, TypeError) as err:
--> 325 raise TypeError(f"cannot handle this type -> {values.dtype}") from err
326
TypeError: cannot handle this type -> Int64
The above exception was the direct cause of the following exception:
DataError Traceback (most recent call last)
<ipython-input-11-7b94d5aa8fee> in <module>
1 import pandas as pd
2 s = pd.Series([1,2,3,None,5], dtype='Int64')
----> 3 s.rolling(3).max()
~/.pyenv/versions/3.8.2/envs/test-env/lib/python3.8/site-packages/pandas/core/window/rolling.py in max(self, engine, engine_kwargs, *args, **kwargs)
1764 ):
1765 nv.validate_rolling_func("max", args, kwargs)
-> 1766 return super().max(*args, engine=engine, engine_kwargs=engine_kwargs, **kwargs)
1767
1768 @doc(
~/.pyenv/versions/3.8.2/envs/test-env/lib/python3.8/site-packages/pandas/core/window/rolling.py in max(self, engine, engine_kwargs, *args, **kwargs)
1261 )
1262 window_func = window_aggregations.roll_max
-> 1263 return self._apply(window_func, name="max", **kwargs)
1264
1265 def min(
~/.pyenv/versions/3.8.2/envs/test-env/lib/python3.8/site-packages/pandas/core/window/rolling.py in _apply(self, func, name, numba_cache_key, **kwargs)
543
544 if self.method == "single":
--> 545 return self._apply_blockwise(homogeneous_func, name)
546 else:
547 return self._apply_tablewise(homogeneous_func, name)
~/.pyenv/versions/3.8.2/envs/test-env/lib/python3.8/site-packages/pandas/core/window/rolling.py in _apply_blockwise(self, homogeneous_func, name)
417 """
418 if self._selected_obj.ndim == 1:
--> 419 return self._apply_series(homogeneous_func, name)
420
421 obj = self._create_data(self._selected_obj)
~/.pyenv/versions/3.8.2/envs/test-env/lib/python3.8/site-packages/pandas/core/window/rolling.py in _apply_series(self, homogeneous_func, name)
404 values = self._prep_values(obj._values)
405 except (TypeError, NotImplementedError) as err:
--> 406 raise DataError("No numeric types to aggregate") from err
407
408 result = homogeneous_func(values)
DataError: No numeric types to aggregate
I've only shown this behavior for pandas.core.window.rolling.Rolling.max
, but it also exists for many other of the pandas.core.window.rolling.Rolling
methods. One calculation in the pandas.core.window.rolling.Rolling
series of methods that does not raise this error is count
, which seems to have a different handling of null values altogether from the other methods, so that may be related.
My current work-around is to convert Int64
columns to float64
before calling series.rolling
.
Expected Behavior
I would expect the behavior to match that of a rolling window calculation on a series with 'float64'
dtype series containing nans.
# Using 'float64' works
s = pd.Series([1,2,3,None,5], dtype='float64')
s.rolling(3).max()
In this case, the result is as follows:
0 NaN
1 NaN
2 3.0
3 NaN
4 NaN
dtype: float64
Installed Versions
INSTALLED VERSIONS
commit : 945c9ed
python : 3.8.2.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.4
numpy : 1.21.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.2
setuptools : 41.2.0
Cython : 0.29.17
pytest : 6.0.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : 2021.06.0
fastparquet : 0.5.0
gcsfs : None
matplotlib : 3.4.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 5.0.0
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.53.0