Closed
Description
Discovered while fixing an issue in #28541
A large number of tests relating to DataFrameGroupBy.apply
depend on a bug in apply
where the grouped column is not removed from the returned DataFrame.
Here are some examples:
There are around ~20 cases of this.
In general, they expect the result of something like this:
df = pd.DataFrame({"key": [1, 1, 1, 2, 2, 2, 3, 3, 3], "value": range(9)})
df.groupby('key').apply(lambda x: x.sum())
To include the "key" column in the result, which should not be the case.
key value
key
1 3 3
2 6 12
3 9 21
The underlying issue with apply
is a straightforward fix, all that needs to be done is change this return to use the _group_selection_context
, but the PR will have to include updates to many tests.