Skip to content

BUG: series.to_numpy does not work well with pd.Float64Dtype #40630

Closed
@jaspersival

Description

@jaspersival
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
from scipy.stats import norm

series = pd.Series([2.0, -2.5, pd.NA], dtype=pd.Float64Dtype())
norm.pdf(series)

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\XXX\AppData\Local\pypoetry\Cache\virtualenvs\XXX-w19Rd76b-py3.9\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1837, in pdf
    cond1 = self._support_mask(x, *args) & (scale > 0)
  File "C:\Users\XXX\AppData\Local\pypoetry\Cache\virtualenvs\XXX-w19Rd76b-py3.9\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 964, in _support_mask
    return (a <= x) & (x <= b)
  File "pandas\_libs\missing.pyx", line 360, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

Problem description

My issue is about that pandas does not convert a pd.Series with dtype=pd.Float64Dtype() correctly to a numpy array when I try to use the scipy.stats.norm function. Therefore it cannot handle empty values (pd.NA) and the series needs to be cast right now as a float dtype to properly work which is not ideal. It probably has to do with the fact that the method series.to_numpy is called which does not work well with pd.Float64Dtype.

See also [https://github.com/scipy/scipy/issues/13729]

Expected Output

array([0.05399097, 0.0175283 , nan])

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f2c8480
python : 3.9.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 1.2.3
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 52.0.0
Cython : None
pytest : 6.2.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.1
sqlalchemy : 1.4.2
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCompatpandas objects compatability with Numpy or Python functionsNA - MaskedArraysRelated to pd.NA and nullable extension arrays

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions