Description
I have the following DataFrame:
data
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 410322 entries, (20111230, '00036110') to (20121019, 'Y8564W10')
Data columns:
market_cap 410117 non-null values
average_volume 410322 non-null values
return_std_daily 410322 non-null values
return_std_monthly 410322 non-null values
dtypes: float64(4)
data.index.names
['date', 'security_id']
I try to do this:
data.groupby(level='date').transform(lambda x: x.sort_index(by='average_volume'))
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\IPython\core\interactiveshell.py", line 2721, in run_code
exec code_obj in self.user_global_ns, self.user_ns
File "", line 1, in
data.groupby(level='date').transform(lambda x: x.sort_index(by='average_volume'))
File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 1745, in transform
return self._transform_item_by_item(obj, wrapper)
File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 1777, in _transform_item_by_item
raise TypeError('Transform function invalid for data types')
TypeError: Transform function invalid for data types
At the same time the following works
def apply_by_group(grouped, f):
"""
Applies a function to each Series or DataFrame in a GroupBy object, concatenates the results
and returns the resulting Series or DataFrame.
Parameters
----------
grouped: SeriesGroupBy or DataFrameGroupBy
f: callable
Function to apply to each Series or DataFrame in the grouped object.
Returns
-------
Series or DataFrame that results from applying the function to each Series or DataFrame in the
GroupBy object and concatenating the results.
"""
assert isinstance(grouped, (SeriesGroupBy, DataFrameGroupBy))
assert hasattr(f, '__call__')
groups = []
for key, group in grouped:
groups.append(f(group))
return pd.concat(groups)
apply_by_group(data.groupby(level='date'), lambda x: x.sort_index(by='average_volume'))
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 410322 entries, (20111230, '31340030') to (20121019, '03783310')
Data columns:
market_cap 410117 non-null values
average_volume 410322 non-null values
return_std_daily 410322 non-null values
return_std_monthly 410322 non-null values
dtypes: float64(4)
Is this strange or am I doing something wrong with transform()?