Skip to content

BUG: TypeError when mixing extension array dtypes in mask #50448

Closed
@bollard

Description

@bollard

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# classic, numpy example
ser = pd.Series([0.0, 1.0, 2.0, 3.0])
other = pd.Series([True, False, True, False])
ser.mask(~ser.isna(), other).dtype  # bool

# extension array example
ser = pd.Series([0.0, 1.0, 2.0, 3.0], dtype=pd.Float64Dtype())
other = pd.Series([True, False, True, False], dtype=pd.BooleanDtype())
ser.mask(~ser.isna(), other).dtype # TypeError: object cannot be converted to FloatingDtype

Issue Description

Hi,

When using mask, in which other completely replaces self, the classic numpy approach works as expected (with resulting object of type bool). In the extension array example however, a TypeError is thrown.

A more complicated case is one in which other does not completely replace self, for example

ser = pd.Series([pd.NA, 1.0, 2.0, 3.0], dtype=pd.Float64Dtype())
other = pd.Series([True, False, True, False], dtype=pd.BooleanDtype())
ser.mask(~ser.isna(), other)

Here in the numpy case, the result is downcast to object, but in the extension array case the same TypeError is thrown. I can see how this could be a little trickier, but this could also downcast to object or (in this example) still hold the pd.NA and return a result of pd.BooleanDtype()

Expected Behavior

ser = pd.Series([0.0, 1.0, 2.0, 3.0], dtype=pd.Float64Dtype())
other = pd.Series([True, False, True, False], dtype=pd.BooleanDtype())
ser.mask(~ser.isna(), other) # would expect to return pd.Series([True, False, True, False], dtype=pd.BooleanDtype()), not a TypeError

Installed Versions

INSTALLED VERSIONS

commit : 8dab54d
python : 3.10.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 165 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : English_United Kingdom.1252

pandas : 1.5.2
numpy : 1.23.4
pytz : 2022.1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 22.2.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : 1.0.2
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.6.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli :
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.3
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : 1.4.39
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

Metadata

Metadata

Assignees

Labels

Dtype ConversionsUnexpected or buggy dtype conversionsIndexingRelated to indexing on series/frames, not to indexes themselvesNA - MaskedArraysRelated to pd.NA and nullable extension arraysNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions