Skip to content

df.groupby.agg() removes name of column MultiIndex at level 0 #4013

Closed
@floux

Description

@floux

When applying different functions to columns with a MultiIndex by supplying a mapping to groupby.agg(), the top-level name of the columns get lost.

I believe this is a bug, because the names of the columns are unchanged (the total number of columns might be smaller, if not all columns are in the mapping, though).

In the example here I am using groupby.agg(), even though technically speaking I want to do a transformation. However, groupby.agg() seems to be the only apply-like method that allows the usage of a mapping for different functions per column. What would be the recommended way?

In [2]: df = pd.DataFrame({
   ...:         'exp' : ['A']*6 + ['B']*6,
   ...:         'obj' : [1,1,1,2,2,2]*2,
   ...:         'rep' : [1,2,3] * 4,
   ...:         'var1' : range(12),
   ...:         'var2' : range(12,24),
   ...:         'var3' : range(24,36),
   ...:         })

In [3]: df = df.set_index(['exp', 'obj', 'rep'])

In [4]: df = df.sort_index()

In [5]: df.columns.name = 'vars'

In [6]: print('before unstack: ', df.columns.names)
('before unstack: ', ['vars'])

In [7]: df = df.unstack('rep')

In [8]: print('after unstack: ', df.columns.names)
('after unstack: ', ['vars', 'rep'])

In [9]: funcs = {
   ...:                 'var1' : lambda x: x - x.median(),
   ...:                 'var2' : lambda y: y - y.mean(),
   ...:                 'var3' : lambda y: y - y.sum(),
   ...: }

In [10]: df1 = df.groupby(level=0).agg(funcs)

In [11]: print('after groupby.agg: ', df1.columns.names)
('after groupby.agg: ', [None, 'rep'])

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions