Skip to content

Why pd.BooleanDtype() is casted to Float64 by groupby/last? #33071

Closed
@ghuname

Description

@ghuname

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'a': ['x', 'x', 'y', 'y'], 'b': ['x', 'x', 'y', 'y'], 'c': [False, False, True, False]})
>>> df['d'] = df.c.astype(pd.BooleanDtype())
>>>
>>> df.dtypes
a     object
b     object
c       bool
d    boolean
dtype: object
>>>
>>> df.groupby(['a', 'b']).c.last()
a  b
x  x    False
y  y    False
Name: c, dtype: bool
>>>
>>> df.groupby(['a', 'b']).d.last()
a  b
x  x    0.0
y  y    0.0
Name: d, dtype: float64
>

Problem description

df.groupby(['a', 'b']).c.last() returns False, but df.groupby(['a', 'b']).d.last() returns Float64.
Why the difference?

Expected Output

I expect that both values should be False

Output of pd.show_versions()

python : 3.7.4.final.0
pandas : 1.0.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    Dtype ConversionsUnexpected or buggy dtype conversionsExtensionArrayExtending pandas with custom dtypes or arrays.Groupby

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions