Skip to content

GroupBy transform() is surprisingly slow #2121

Closed
@bluefir

Description

@bluefir

I came across a strange slowness in GroupBy transform() function. I put together a simple function to avoid using apply() because it can be REALLY slow:

def apply_by_group(grouped, f):
    """
    Applies a function to each DataFrame in a DataFrameGroupBy object, concatenates the results
    and returns the resulting DataFrame.

    Parameters
    ----------
    grouped: DataFrameGroupBy
        The grouped DataFrame that contains column(s) to be ranked and, potentially, a column with weights.
    f: callable
        Function to apply to each DataFrame.

    Returns
    -------
    DataFrame that results from applying the function to each DataFrame in the DataFrameGroupBy object and
    concatenating the results.

    """
    assert isinstance(grouped, DataFrameGroupBy)
    assert hasattr(f, '__call__')

    data_frames = []
    for key, data_frame in grouped:
        data_frames.append(f(data_frame))
    return pd.concat(data_frames)

Now I observe the following for the two equivalent ways of doing the same thing:

%timeit data.groupby(level=field_security_id).transform(lambda x: x.fillna())

1 loops, best of 3: 24.3 s per loop

%timeit apply_by_group(data.groupby(level=field_security_id), lambda x: x.fillna())

1 loops, best of 3: 2.72 s per loop

That was unexpected. Am I doing something wrong in using transform()?

Metadata

Metadata

Assignees

Labels

PerformanceMemory or execution speed performance

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions