Skip to content

GroupBy transform() throws unexpected exception when sorting each DataFrame #2171

Closed
@bluefir

Description

@bluefir

I have the following DataFrame:

data

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 410322 entries, (20111230, '00036110') to (20121019, 'Y8564W10')
Data columns:
market_cap 410117 non-null values
average_volume 410322 non-null values
return_std_daily 410322 non-null values
return_std_monthly 410322 non-null values
dtypes: float64(4)

data.index.names

['date', 'security_id']

I try to do this:

data.groupby(level='date').transform(lambda x: x.sort_index(by='average_volume'))

Traceback (most recent call last):
File "C:\Python27\lib\site-packages\IPython\core\interactiveshell.py", line 2721, in run_code
exec code_obj in self.user_global_ns, self.user_ns
File "", line 1, in
data.groupby(level='date').transform(lambda x: x.sort_index(by='average_volume'))
File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 1745, in transform
return self._transform_item_by_item(obj, wrapper)
File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 1777, in _transform_item_by_item
raise TypeError('Transform function invalid for data types')
TypeError: Transform function invalid for data types

At the same time the following works

def apply_by_group(grouped, f):
    """
    Applies a function to each Series or DataFrame in a GroupBy object, concatenates the results
    and returns the resulting Series or DataFrame.

    Parameters
    ----------
    grouped: SeriesGroupBy or DataFrameGroupBy
    f: callable
        Function to apply to each Series or DataFrame in the grouped object.

    Returns
    -------
    Series or DataFrame that results from applying the function to each Series or DataFrame in the
    GroupBy object and concatenating the results.

    """
    assert isinstance(grouped, (SeriesGroupBy, DataFrameGroupBy))
    assert hasattr(f, '__call__')

    groups = []
    for key, group in grouped:
        groups.append(f(group))
    return pd.concat(groups)
apply_by_group(data.groupby(level='date'), lambda x: x.sort_index(by='average_volume'))

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 410322 entries, (20111230, '31340030') to (20121019, '03783310')
Data columns:
market_cap 410117 non-null values
average_volume 410322 non-null values
return_std_daily 410322 non-null values
return_std_monthly 410322 non-null values
dtypes: float64(4)

Is this strange or am I doing something wrong with transform()?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions