BUG: groupby fast_apply vs python apply handles same-indexed result differently

From https://github.com/pandas-dev/pandas/issues/39146#issuecomment-799397428 (discovered while investigating a benchmark difference). It seems that in groupby/ops.py, the `fast_apply` (using libreduction) vs the generic python apply gives a different result in case of same-indexed output of the function.

Using a small example dataframe and a function to be applied which simply copies the input:

```python
N = 10
df = pd.DataFrame(
    {
        "key": np.random.randint(0, 3, size=N),
        "value1": np.random.randn(N),
        "value2": ["foo", "bar"] * (N // 2),
    }
)

def df_copy_function(g):
    # ensure that the group name is available (see GH #15062)
    g.name
    return g.copy()
```

By default you get this result:

```
In [3]: df.groupby("key").apply(df_copy_function)
Out[3]: 
       key    value1 value2
key                        
0   8    0 -0.149534    foo
    9    0 -0.391135    bar
1   1    1 -0.581107    bar
    2    1 -0.338278    foo
    3    1  0.768924    bar
    6    1 -0.778718    foo
2   0    2  0.196477    foo
    4    2 -0.364822    foo
    5    2 -0.976079    bar
    7    2 -2.671668    bar
```

But if I trigger to *not* take the fast apply path (in this case by making one column an extension dtype), we get a different result:

```
In [4]: df['value2'] = df["value2"].astype("string")

In [5]: df.groupby("key").apply(df_copy_function)
Out[5]: 
   key    value1 value2
0    2  0.196477    foo
1    1 -0.581107    bar
2    1 -0.338278    foo
3    1  0.768924    bar
4    2 -0.364822    foo
5    2 -0.976079    bar
6    1 -0.778718    foo
7    2 -2.671668    bar
8    0 -0.149534    foo
9    0 -0.391135    bar
```

This might be another manifestation of https://github.com/pandas-dev/pandas/pull/34998 and the issues linked from that PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: groupby fast_apply vs python apply handles same-indexed result differently #40446

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: groupby fast_apply vs python apply handles same-indexed result differently #40446

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions