Skip to content

ENH: Support for semi-join on index (subsetting a multi-index dataframe with a subset of the multi-index) #58873

Open
@gwerbin

Description

@gwerbin

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I think there are some related issues on this topic, but none that clearly describes the particular use case I have in mind.

Consider data frame P with multi-index columns a, b, c and data frame Q with multi-index columns a, b.

Given some subset of rows from Q, I want to be able to use its .index to subset rows from P, without a lot of intermediate processing or boilerplate code.

In relational terms, this is a semi-join operation between P and the subset of rows from Q.

Feature Description

Setup:

P = pd.DataFrame(..., index=pd.MultiIndex(..., names=["a", "b", "c"]))
Q = pd.DataFrame(..., index=pd.MultiIndex(..., names=["a", "b"]))

assert not P.index.duplicated.any()
assert not Q.index.duplicated.any()

Q_sub = Q.head()

Desired functionality

Any of these, or something similarly convenient:

P_sub = P.loc[Q_sub.index]

P_sub = P.xs(Q_sub.index)

P_sub = P.xs(Q_sub.index, level=Q_sub.index.names)

P_sub = P.join_semi(Q_sub, how="semi")

Alternative Solutions

Something like this, I think?

P_sub = P.loc[pd.IndexSlice[Q_sub.index.get_level_values("a"), Q_sub.index.get_level_values("b"), :]]

In the case when Q_sub is exactly 1 row, we can also use .xs:

P_sub = P.xs(Q_sub.index[0], level=Q_sub.index.names)

Additional Context

Maybe related to #4036 and #3057

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closing CandidateMay be closeable, needs more eyeballsEnhancementReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions