Description
See #6068
Use case
Facilitate DataFrame group/apply transformations when using a function that returns a Series. Right now, if we perform the following:
import pandas
df = pandas.DataFrame(
{'a': [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
'b': [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1],
'c': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
'd': [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
})
def count_values(df):
return pandas.Series({'count': df['b'].sum(), 'mean': df['c'].mean()}, name='metrics')
result = df.groupby('a').apply(count_values)
print result.stack().reset_index()
We get the following output:
a level_1 0
0 0 count 2.0
1 0 mean 0.5
2 1 count 2.0
3 1 mean 0.5
4 2 count 2.0
5 2 mean 0.5
[6 rows x 3 columns]
Ideally, the series name should be preserved and propagated through these operations such that we get the following output:
a metrics 0
0 0 count 2.0
1 0 mean 0.5
2 1 count 2.0
3 1 mean 0.5
4 2 count 2.0
5 2 mean 0.5
[6 rows x 3 columns]
The only way to achieve this (currently) is:
result = df.groupby('a').apply(count_values)
result.columns.name = 'metrics'
print result.stack().reset_index()
However, the key issue here is 1) this adds an extra line of code and 2) the name of the series created in the applied function may not be known in the outside block (so we can't properly fix the result.columns.name attribute).
The other work-around is to name the index of the series:
def count_values(df):
series = pandas.Series({'count': df['b'].sum(), 'mean': df['c'].mean()})
series.index.name = 'metrics'
return series
During the group/apply operation, one approach is to check to see whether series.index has the name attribute set. If the name attribute is not set, it will set the index.name attribute to the name of the series (thus ensuring the name propagates).