Skip to content

Groupby/apply behaves differently when grouping column contains tuples #19588

Closed
@jhrmnn

Description

@jhrmnn

Code Sample, a copy-pastable example if possible

This is adapted from the docs, just replacing column 'a' with a list of tuples:

import pandas as pd

df = pd.DataFrame({
        'a':  [(0,), (0,), (0,), (0,), (1,), (1,), (1,), (1,), (2,), (2,), (2,), (2,)],
        'b':  [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1],
        'c':  [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
        'd':  [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
        })

def compute_metrics(x):
    result = {'b_sum': x['b'].sum(), 'c_mean': x['c'].mean()}
    return pd.Series(result, name='metrics')

df.groupby('a').apply(compute_metrics)

Problem description

Without the modification, the return value is a dataframe with the apply-returned Series objects concatenated. With the modification, it is a Series object filled with the individual Series objects.

Expected Output

The same behavior with and without modification.

Background

The divergence in the behavior is caused by the code in pandas/core/index.py introduced in #10703, which was a reaction on #10697. Simply commenting out the if block if all( isinstance(e, tuple) for e in data ): solves the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions