Skip to content

BUG: index of group not returned correctly in groupby.apply #22541

Closed
@h-vetinari

Description

@h-vetinari

Returning the index of a group is admittedly a somewhat unusual use of apply (since the information is available in groups), but it's clearly legal and shouldn't be wrong.

N = 10
df = pd.DataFrame(np.random.randint(0, int(N/3), (N,)) + 10, columns=['id'])
df
#    id
# 0  11
# 1  11
# 2  11
# 3  10
# 4  11
# 5  12
# 6  12
# 7  12
# 8  10
# 9  12

The issue is that the result of last group gets wrongly broadcast to all groups

df.groupby('id', as_index=True).apply(lambda gr: gr.index)
# id
# 10    Int64Index([5, 6, 7, 9], dtype='int64')
# 11    Int64Index([5, 6, 7, 9], dtype='int64')
# 12    Int64Index([5, 6, 7, 9], dtype='int64')
# dtype: object

Interestingly, with adding any operation I've tried, the behaviour is correct again:

df.groupby('id', as_index=True).apply(lambda gr: gr.index + 1 - 1)
# id
# 10          Int64Index([3, 8], dtype='int64')
# 11    Int64Index([0, 1, 2, 4], dtype='int64')
# 12    Int64Index([5, 6, 7, 9], dtype='int64')
# dtype: object

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugGroupby

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions