Skip to content

Adding support for indexing a MultiIndex with a DataFrame and/or bi-dimensional np.array #15438

Open
@toobaz

Description

@toobaz

(From #15425 )

Currently, (non-Multi)Indexes can be indexed with Series indexers. And this actually also applies to MultiIndexes, of which you would be selecting from the first level. Hence, it seems a natural consequence for MultiIndexes to be indexed with DataFrame indexers.

Moreover, once #15434 is fixed, we will have a bi-dimensional object (MultiIndex) which can be indexed with np.arrays... but only one-dimensional ones! This is also strange.

The feature per se is certainly useful. As a simple real world example, I am currently working with a subjects DataFrame to which I must attribute two columns from design, another DataFrame, depending on a group and time columns of subjects, which are also levels of the MultiIndex of design. I would like to just do

subjects[design.columns] = design.loc[subjects[["group", "time"]]]

Now, I know this could be solved by .joining the two DataFrames... but this is conceptually more complicated (I even currently ignore whether I can join one DataFrame on columns and the other on index levels... but this is OT), to the point that I'm rather doing:

to_mi = lambda df : df.set_index(list(df.columns)).index
subjects[design.columns] = design.loc[to_mi(subjects[["group", "time"]])]

@jorisvandenbossche suggests this feature would add complexity to indexing, "eg, should the column names align on the level names?". I'm personally fine with both answers:

  • Yes: then we just use something like to_mi above (transforming a DataFrame in MultiIndex, and then using it to actually index)
  • No: then it's really really simple (we just transform the DataFrame into tuples - I had actually already done this in Mi indexing #15425 before rolling back)

"Yes" is probably the cleanest answer (possibly together with allowing indexing with bi-dimensional np.arrays, to obtain the equivalent of the "No" answer). In any case, once we decide, I can take care of this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions