Closed
Description
I came across a strange slowness in GroupBy transform() function. I put together a simple function to avoid using apply() because it can be REALLY slow:
def apply_by_group(grouped, f):
"""
Applies a function to each DataFrame in a DataFrameGroupBy object, concatenates the results
and returns the resulting DataFrame.
Parameters
----------
grouped: DataFrameGroupBy
The grouped DataFrame that contains column(s) to be ranked and, potentially, a column with weights.
f: callable
Function to apply to each DataFrame.
Returns
-------
DataFrame that results from applying the function to each DataFrame in the DataFrameGroupBy object and
concatenating the results.
"""
assert isinstance(grouped, DataFrameGroupBy)
assert hasattr(f, '__call__')
data_frames = []
for key, data_frame in grouped:
data_frames.append(f(data_frame))
return pd.concat(data_frames)
Now I observe the following for the two equivalent ways of doing the same thing:
%timeit data.groupby(level=field_security_id).transform(lambda x: x.fillna())
1 loops, best of 3: 24.3 s per loop
%timeit apply_by_group(data.groupby(level=field_security_id), lambda x: x.fillna())
1 loops, best of 3: 2.72 s per loop
That was unexpected. Am I doing something wrong in using transform()?