Skip to content

API: groupby aggregation with apply does not drop groupby-column #22542

Closed
@h-vetinari

Description

@h-vetinari

The docs for groupby say (http://pandas.pydata.org/pandas-docs/stable/groupby.html):

Note:
Aggregation functions will not return the groups that you are aggregating over if they are named columns, when as_index=True, the default. The grouped columns will be the indices of the returned object.
Passing as_index=False will return the groups that you are aggregating over, if they are named columns.

From the section, it's implied that this is talking about builtins and the aggregate functionality, but I very often find myself operating with complicated functions on the groups themselves, so apply is my bread and butter (and this is part of a larger issue that groupby.apply has some inconsistent behavior).

N = 10
df = pd.DataFrame(index=range(N), columns=['id', 'x', 'y', 'z'])
df.loc[:, ['x', 'y', 'z']] = np.arange(N*3).reshape(N, 3)
df.id = np.random.randint(0, int(N/3), (N,)) + 10
df
#    id   x   y   z
# 0  12   0   1   2
# 1  12   3   4   5
# 2  11   6   7   8
# 3  10   9  10  11
# 4  12  12  13  14
# 5  12  15  16  17
# 6  12  18  19  20
# 7  11  21  22  23
# 8  10  24  25  26
# 9  10  27  28  29

For something like sum, the groupby-column gets dropped, as described:

df.groupby('id').sum()
#      x   y   z
# id            
# 10  60  63  66
# 11  27  29  31
# 12  48  53  58

But for using the same function in apply, the result is different - mainly that the groupby column does not get removed (but also the dtype)

df.groupby('id', as_index=True).apply(lambda gr: gr.sum())
#       id     x     y     z
# id                        
# 10  30.0  60.0  63.0  66.0
# 11  22.0  27.0  29.0  31.0
# 12  60.0  48.0  53.0  58.0

Ideally, I'd like the make the behaviour of groupby.apply more consistent in a number of cases, and this is one of them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions