Skip to content

PERF: DataFrame groupby with fast transform #12737

Closed
@jreback

Description

@jreback

from SO

import pandas as pd
import numpy as np

df = pd.DataFrame({'group': np.repeat(np.arange(1000), 10),
                   'B': np.nan,
                   'C': np.nan})

df.ix[4::10, 'B':'C'] = 5 # every 4th row of a group is non-null

df.groupby('group').transform('first')

This is then iterating over groups. Last I can see this was changed is: here. My recollection is that this was ONLY supposed to hit in a special case, and the general case is simply a repeat based on the indices.

This seems to be hitting in all cases makes transform back to super SLOW.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions