Skip to content

Partial Selection on MultiIndex: Control what index depth is returned. #3057

Closed
@dragoljub

Description

@dragoljub

Partial selection using .xs() & .ix[] on a subset of index levels returns a df with the fixed/selected levels dropped (a very nice feature). However, when you partially select using a tuple on all levels you get a data frame with all indices (levels) returned. It would be nice to have an option to return any/all indices when sub-selecting using subset of levels so there is consistency when you reach the lowest index level.

Perhaps there could be a "drop_fixed_index" parameter option when sub-selecting.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: print pd.__version__
0.11.0.dev-80945b6

In [4]: # Generate Test DataFrame
   ...: NUM_ROWS = 100000
   ...:

In [5]: NUM_COLS = 10

In [6]: col_names = ['A'+num for num in map(str,np.arange(NUM_COLS).tolist())]

In [7]: index_cols = col_names[:5]

In [8]: # Set DataFrame to have 5 level Hierarchical Index & Sort Index!
   ...: # The dtype does not matter try str or np.int64 same results.
   ...: df = pd.DataFrame(np.random.randint(5, size=(NUM_ROWS,NUM_COLS)), dtype=np.int64, columns=col_names)
   ...:

In [9]: df = df.set_index(index_cols).sort_index()

...

In [79]: df
Out[79]: <class 'pandas.core.frame.DataFrame'>
MultiIndex: 100000 entries, (0, 0, 0, 0, 0) to (4, 4, 4, 4, 4)
Data columns:
A5    100000  non-null values
A6    100000  non-null values
A7    100000  non-null values
A8    100000  non-null values
A9    100000  non-null values
dtypes: int64(5)

In [80]: df.ix[(0)]
Out[80]: <class 'pandas.core.frame.DataFrame'>
MultiIndex: 20011 entries, (0, 0, 0, 0) to (4, 4, 4, 4)
Data columns:
A5    20011  non-null values
A6    20011  non-null values
A7    20011  non-null values
A8    20011  non-null values
A9    20011  non-null values
dtypes: int64(5)

In [81]: df.ix[(0,1)]
Out[81]: <class 'pandas.core.frame.DataFrame'>
MultiIndex: 4007 entries, (0, 0, 0) to (4, 4, 4)
Data columns:
A5    4007  non-null values
A6    4007  non-null values
A7    4007  non-null values
A8    4007  non-null values
A9    4007  non-null values
dtypes: int64(5)

In [82]: df.ix[(0,1,2)]
Out[82]: <class 'pandas.core.frame.DataFrame'>
MultiIndex: 817 entries, (0, 0) to (4, 4)
Data columns:
A5    817  non-null values
A6    817  non-null values
A7    817  non-null values
A8    817  non-null values
A9    817  non-null values
dtypes: int64(5)

In [83]: df.ix[(0,1,2,3)]
Out[83]: <class 'pandas.core.frame.DataFrame'>
Int64Index: 162 entries, 0 to 4
Data columns:
A5    162  non-null values
A6    162  non-null values
A7    162  non-null values
A8    162  non-null values
A9    162  non-null values
dtypes: int64(5)

In [84]: df.ix[(0,1,2,3,4)]
Out[84]:                 A5  A6  A7  A8  A9
A0 A1 A2 A3 A4                    
0  1  2  3  4    1   2   2   4   2
            4    1   4   4   1   0
            4    2   1   4   1   3
            4    2   4   2   1   1
            4    1   1   2   1   4
            4    0   0   2   1   1
            4    2   0   0   3   1
            4    2   2   3   3   1
            4    3   0   3   4   1
            4    1   1   0   0   1
            4    2   1   0   2   4
            4    3   4   1   2   3
            4    0   4   3   1   0
            4    4   1   4   1   2
            4    1   3   4   3   3
            4    0   1   1   3   1
            4    2   2   2   0   3
            4    0   0   1   4   0
            4    1   0   1   4   2
            4    1   4   2   2   0
            4    4   2   0   3   1
            4    2   1   2   3   2
            4    4   2   0   1   4
            4    1   4   1   1   4
            4    1   0   1   2   4
            4    2   3   0   1   3
            4    2   1   3   3   3
            4    1   2   0   4   2
            4    3   0   4   4   0
            4    4   4   2   3   0
            4    0   0   1   3   2
            4    4   0   0   0   3
            4    2   0   3   4   2
            4    3   3   3   0   2
            4    4   2   2   0   1
            4    2   1   3   4   0

In [86]: df.index.lexsort_depth
Out[86]: 5

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions