Skip to content

BUG: DataFrame.unstack() does not properly sort list of levels #9514

Open
@seth-p

Description

@seth-p

In 0.15.2 (and I believe this remains the case), the docstring for DataFrame.unstack() states The level involved will automatically get sorted.. This is not necessarily the case when level is a list of levels.

In [40]: df = pd.DataFrame(np.arange(8).reshape((4, 2)),
                           index=pd.MultiIndex.from_tuples([(100, 'A', 'y'), (100, 'A', 'x'),
                                                            (100, 'B', 'x'), (200, 'B', 'y')],
                                                           names=['Nums', 'Upper','Lower']))

In [41]: df
Out[41]:
                  0  1
Nums Upper Lower
100  A     y      0  1
           x      2  3
     B     x      4  5
200  B     y      6  7

In [42]: df.unstack([1, 2])
Out[42]:
        0               1
Upper   A       B       A       B
Lower   y   x   x   y   y   x   x   y
Nums
100     0   2   4 NaN   1   3   5 NaN
200   NaN NaN NaN   6 NaN NaN NaN   7

Note that the pivoted tuples are ordered as [(A, y), (A, x), (B, x), (B, y)], which is not sorted.
I would expect the result to be the same as the following:

In [43]: df.T.stack([1, 2]).T
Out[43]:
        0               1
Upper   A       B       A       B
Lower   x   y   x   y   x   y   x   y
Nums
100     2   0   4 NaN   3   1   5 NaN
200   NaN NaN NaN   6 NaN NaN NaN   7

In fact, there seems to be a problem even when level is a list containing just a single level. Compare the following:

In [47]: df.unstack(2)
Out[47]:
             0       1
Lower        x   y   x   y
Nums Upper
100  A       2   0   3   1
     B       4 NaN   5 NaN
200  B     NaN   6 NaN   7

In [48]: df.unstack([2])
Out[48]:
             0       1
Lower        y   x   y   x
Nums Upper
100  A       0   2   1   3
     B     NaN   4 NaN   5
200  B       6 NaN   7 NaN

In [50]: df.T.stack(2).T
Out[50]:
             0       1
Lower        x   y   x   y
Nums Upper
100  A       2   0   3   1
     B       4 NaN   5 NaN
200  B     NaN   6 NaN   7

In [51]: df.T.stack([2]).T
Out[51]:
             0       1
Lower        x   y   x   y
Nums Upper
100  A       2   0   3   1
     B       4 NaN   5 NaN
200  B     NaN   6 NaN   7

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions