Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Simple series with floats, dtype float64:
>>> df = pd.Series([0.0, 1.0, 2.0, 3.0, 4.0])
>>> df
0 0.0
1 1.0
2 2.0
3 3.0
4 4.0
dtype: float64
diff() introduces a NaN in the first position:
>>> df.diff()
0 NaN
1 1.0
2 1.0
3 1.0
4 1.0
dtype: float64
This works as expected:
>>> df.diff().rolling(2).sum()
0 NaN
1 NaN
2 2.0
3 2.0
4 2.0
We can cast to a Float64:
>>> df.astype('Float64')
0 0.0
1 1.0
2 2.0
3 3.0
4 4.0
dtype: Float64
diff() still works fine, but gives us a pd.NA instead of a NaN:
>>> df.astype('Float64').diff()
0 <NA>
1 1.0
2 1.0
3 1.0
4 1.0
dtype: Float64
Now, when we call rolling(), everything comes tumbling down:
>>> df.astype('Float64').diff().rolling(2).sum()
Traceback (most recent call last):
File "/home/janlugt/repositories/streaming_anomaly_detection/.venv/lib/python3.8/site-packages/pandas/core/window/rolling.py", line 321, in _prep_values
values = ensure_float64(values)
File "pandas/_libs/algos_common_helper.pxi", line 45, in pandas._libs.algos.ensure_float64
File "/home/janlugt/repositories/streaming_anomaly_detection/.venv/lib/python3.8/site-packages/pandas/core/arrays/masked.py", line 335, in __array__
return self.to_numpy(dtype=dtype)
File "/home/janlugt/repositories/streaming_anomaly_detection/.venv/lib/python3.8/site-packages/pandas/core/arrays/masked.py", line 292, in to_numpy
raise ValueError(
ValueError: cannot convert to 'float64'-dtype NumPy array with missing values. Specify an appropriate 'na_value' for this dtype.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/janlugt/repositories/streaming_anomaly_detection/.venv/lib/python3.8/site-packages/pandas/core/window/rolling.py", line 402, in _apply_series
values = self._prep_values(obj._values)
File "/home/janlugt/repositories/streaming_anomaly_detection/.venv/lib/python3.8/site-packages/pandas/core/window/rolling.py", line 323, in _prep_values
raise TypeError(f"cannot handle this type -> {values.dtype}") from err
TypeError: cannot handle this type -> Float64
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/janlugt/repositories/streaming_anomaly_detection/.venv/lib/python3.8/site-packages/pandas/core/window/rolling.py", line 1723, in sum
return super().sum(*args, engine=engine, engine_kwargs=engine_kwargs, **kwargs)
File "/home/janlugt/repositories/streaming_anomaly_detection/.venv/lib/python3.8/site-packages/pandas/core/window/rolling.py", line 1233, in sum
return self._apply(window_func, name="sum", **kwargs)
File "/home/janlugt/repositories/streaming_anomaly_detection/.venv/lib/python3.8/site-packages/pandas/core/window/rolling.py", line 539, in _apply
return self._apply_blockwise(homogeneous_func, name)
File "/home/janlugt/repositories/streaming_anomaly_detection/.venv/lib/python3.8/site-packages/pandas/core/window/rolling.py", line 417, in _apply_blockwise
return self._apply_series(homogeneous_func, name)
File "/home/janlugt/repositories/streaming_anomaly_detection/.venv/lib/python3.8/site-packages/pandas/core/window/rolling.py", line 404, in _apply_series
raise DataError("No numeric types to aggregate") from err
pandas.core.base.DataError: No numeric types to aggregate
Problem description
My expectation would be that whether we call rolling on a float64 or a Float64 series, the output would be the same. The promise of pd.NA is that it can be used consistently across data types, and make the use of specific NaN types such as np.nan or NaT unnecessary, which is clearly not the case here.
Expected Output
>>> df.astype('Float64').diff().rolling(2).sum()
0 <NA>
1 <NA>
2 2.0
3 2.0
4 2.0
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 5f648bf
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.16.3-microsoft-standard-WSL2
Version : #1 SMP Fri Apr 2 22:23:49 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.2
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.2.4
setuptools : 57.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None