Skip to content

Confusion around sortedness with MultiIndex #10651

Closed
@tangobravo

Description

@tangobravo

I find the docs around MultiIndex slicers to be quite confusing. It implies the MultiIndex needs to be lexsorted, and introduces the sortlevel() function but then has a caveat that this doesn't actually ensure sortedness.

There's some more details of my explorations and questions on StackOverflow:
http://stackoverflow.com/questions/31427466/ensuring-lexicographical-sort-in-pandas-multiindex

I'd like either a simple one-liner to ensure lexsortedness, more reassurance that the usual ways to create a MultiIndex will lexsort the labels in each level, or some more elaboration in the docs about exactly what the issues will be with indexing and slicing if the labels are not lexsorted.

Does my example show a bug in is_lexsorted too? I would expect sorted2.is_lexsorted() to be false here, as 'col1' is not lexsorted.

In [8]:
sorted2 = df3.sortlevel()
sorted2

Out[8]: 
            data
col1 col2       
b    1     three
     3       one
d    1       two
a    2      four

In [9]: sorted2.index.is_lexsorted()
Out[9]: True

In [10]: sorted2.index
Out[10]: 
MultiIndex(levels=[[u'b', u'd', u'a'], [1, 2, 3]],
           labels=[[0, 0, 1, 2], [0, 2, 0, 1]],
           names=[u'col1', u'col2'])

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions