Skip to content

ENH: Handle for current DataFrame in pd.query expression #43437

Open
@julibeg

Description

@julibeg

Is your feature request related to a problem?

I use pd.query and pd.eval a lot. However, sometimes I find myself in situations where I would like to filter an unnamed DataFrame with pd.query and it would be very handy if there was a handle for the DataFrame available in the query expression.

Describe the solution you'd like

For sake of an example, consider this DataFrame:

>>> df = pd.DataFrame(
        data=np.arange(20).reshape(5, 4),
        columns=pd.MultiIndex.from_product([['A', 'B'], ['x', 'y']]))
>>> df
    A       B    
    x   y   x   y
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15
4  16  17  18  19

In order to get all y columns with values above 10, I would like to use something like

df.loc[:, (slice(None), 'y')].query('(PLACEHOLDER > 10).any(1)')

The placeholder could be a symbol like # or simply nothing (i.e. one would write df.loc[:, (slice(None), 'y')].query('(> 10).any(1)')), but this would be harder to implement I guess.

API breaking implications

I don't think that this would break anything as it is just the addition of a feature.

Describe alternatives you've considered

I know that what I wanted to do in my example can be achieved in multiple other ways. However, most of them are neither as concise nor as flexible as simply being able to access the current DataFrame in the query expression.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions