Open
Description
To reproduce:
import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['1st', '2nd'])
df = pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index2[:6])
df.stack('1st').\
groupby(level=['1st', 'first'])['one'].\
nlargest(3)
output:
1st first first second 1st
bar bar bar two bar 1.985984
one bar -0.175834
baz baz two bar 1.089335
one bar -1.447996
foo foo one bar 1.154403
two bar -0.112992
baz bar bar one baz 0.528254
two baz -2.052733
baz baz one baz -0.447332
two baz -0.818858
foo foo one baz -0.939259
two baz -1.282821
foo bar bar two foo 2.075440
one foo -0.423804
baz baz one foo 0.250846
two foo 0.177904
foo foo two foo -0.468794
one foo -0.627487
Name: one, dtype: float64
expected result can be obtained by resetting the index:
df.stack('1st').\
groupby(level=['1st', 'first'])['one'].\
nlargest(3).\
reset_index(level=[0, 1], drop=True)
The issue is a problem because the resulting index now contains unnecessary duplicates. The behaviour is also not consistent with when all the columns are grouped by:
df.stack('1st').\
groupby(level=['1st', 'first', 'second'])['one'].\
nlargest(3)
Out:
first second 1st
bar one bar -0.175834
baz 0.528254
foo -0.423804
two bar 1.985984
baz -2.052733
foo 2.075440
baz one bar -1.447996
baz -0.447332
foo 0.250846
two bar 1.089335
baz -0.818858
foo 0.177904
foo one bar 1.154403
baz -0.939259
foo -0.627487
two bar -0.112992
baz -1.282821
foo -0.468794
Name: one, dtype: float64
Pandas version 0.20.3