Closed
Description
Code Sample
pd.DataFrame({'x': [1, 1, 2, 2], 'y': [1, 2, 3, pd.NA]}).groupby('x').first()
# *** TypeError: boolean value of NA is ambiguous
pd.DataFrame({'x': [1, 1, 2, 2], 'y': [1, 2, 3, np.nan]}).groupby('x').first()
# y
# x
# 1 1.0
# 2 3.0
pd.DataFrame({'x': [1, 1, 2, 2], 'y': [1, 2, 3, pd.NA]}).astype('Int64').groupby('x').first()
# y
# x
# 1 1
# 2 3
Problem description
Applying the GroupBy.first
aggregation to a object
dtype column that contains a pd.NA
causes the method to fail with an exception: TypeError: boolean value of NA is ambiguous
. Method works fine when using np.nan
and also works as expected when the column is first converted to an Int64
dtype column.
Expected Output
y
x
1 1
2 3
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-76-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : pt_PT.UTF-8
pandas : 1.0.1
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.0
pip : 20.0.2
setuptools : 42.0.1
Cython : None
pytest : 5.2.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.10.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.2.3
pyxlsb : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.10
tables : 3.6.1
tabulate : 0.8.6
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.46.0