Skip to content

BUG: GroupBy.first fails with pd.NA on Series with object dtype #32123

Closed
@jprafael

Description

@jprafael

Code Sample

pd.DataFrame({'x': [1, 1, 2, 2], 'y': [1, 2, 3, pd.NA]}).groupby('x').first()
# *** TypeError: boolean value of NA is ambiguous

pd.DataFrame({'x': [1, 1, 2, 2], 'y': [1, 2, 3, np.nan]}).groupby('x').first()
#     y
# x     
# 1  1.0
# 2  3.0

pd.DataFrame({'x': [1, 1, 2, 2], 'y': [1, 2, 3, pd.NA]}).astype('Int64').groupby('x').first()
#    y
# x   
# 1  1
# 2  3

Problem description

Applying the GroupBy.first aggregation to a object dtype column that contains a pd.NA causes the method to fail with an exception: TypeError: boolean value of NA is ambiguous. Method works fine when using np.nan and also works as expected when the column is first converted to an Int64 dtype column.

Expected Output

   y
x   
1  1
2  3

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.6.9.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.15.0-76-generic
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : pt_PT.UTF-8

pandas           : 1.0.1
numpy            : 1.17.4
pytz             : 2019.3
dateutil         : 2.8.0
pip              : 20.0.2
setuptools       : 42.0.1
Cython           : None
pytest           : 5.2.3
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : 2.8.3 (dt dec pq3 ext lo64)
jinja2           : 2.10.3
IPython          : 7.10.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.1.1
numexpr          : 2.7.0
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : 5.2.3
pyxlsb           : None
s3fs             : None
scipy            : 1.3.2
sqlalchemy       : 1.3.10
tables           : 3.6.1
tabulate         : 0.8.6
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : 0.46.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions