Skip to content

BUG: Segmentation fault with groupby/transform #46566

Closed
@ian-r-rose

Description

@ian-r-rose

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas

df = pandas.DataFrame(
    {
        "A": [1, 2, 3, 4, 5] * 4,
        "B": [1, 2, 3, 4, 5] * 4,
        "C": [1, 2, 3, 4, 5] * 4,
    }
)

df.groupby(["A", "B"]).transform(lambda x: x)

Issue Description

👋 Since about February 27, the above snippet has been generating a segmentation fault in pandas main. As far as I can tell, this is coming from get_group_index_sorter() in pandas.core.sorting.

Based on the timing and git history, it may be related to #45953, though I've been unable to identify the source of the problem thus far.

A few observations:

  1. The length of the series seem to matter. If I shorten the sample df to have length 15, things work fine.
  2. It seems to matter if I groupby more than one field (just grouping by "A" works fine)
  3. The segfault only happens for transform. If I use apply it works.

Expected Behavior

No segfault should occur.

Installed Versions

This shows up on pandas main.

Based on the nightly builds here, it seems like the first affected version was this one.

Metadata

Metadata

Assignees

Labels

ApplyApply, Aggregate, Transform, MapBugGroupbyRegressionFunctionality that used to work in a prior pandas version

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions