Closed
Description
Returning the index of a group is admittedly a somewhat unusual use of apply
(since the information is available in groups
), but it's clearly legal and shouldn't be wrong.
N = 10
df = pd.DataFrame(np.random.randint(0, int(N/3), (N,)) + 10, columns=['id'])
df
# id
# 0 11
# 1 11
# 2 11
# 3 10
# 4 11
# 5 12
# 6 12
# 7 12
# 8 10
# 9 12
The issue is that the result of last group gets wrongly broadcast to all groups
df.groupby('id', as_index=True).apply(lambda gr: gr.index)
# id
# 10 Int64Index([5, 6, 7, 9], dtype='int64')
# 11 Int64Index([5, 6, 7, 9], dtype='int64')
# 12 Int64Index([5, 6, 7, 9], dtype='int64')
# dtype: object
Interestingly, with adding any operation I've tried, the behaviour is correct again:
df.groupby('id', as_index=True).apply(lambda gr: gr.index + 1 - 1)
# id
# 10 Int64Index([3, 8], dtype='int64')
# 11 Int64Index([0, 1, 2, 4], dtype='int64')
# 12 Int64Index([5, 6, 7, 9], dtype='int64')
# dtype: object