Skip to content

ENH/API: Change query/eval local variable API #5987

Closed
@cpcloud

Description

@cpcloud

Currently, with query and eval you can use local variables a la the @ symbol. It's a bit confusing since you're not allowed to have a local variable and a column name with the same name, but it will try to pull the local if possible.

Current API:

Fails with a NameError:

a = 1
df  = DataFrame({'a': randn(10), 'b': randn(10)})
df.query('a > b')

But this works:

df.query('@a > b')

And so does this, which is confusing:

a = 1
df = DataFrame({'b': randn(10), 'c': randn(10)})
df.query('a < b < c')

As suggested by @y-p and @jreback, the following API is less confusing IMO.

From now on, all local variables will need an explicit reference and if there is a column name and a local with the same name then the column will be used. Thus you can always be sure that you're referring to a column, or it doesn't exist, in which case you'll get an error. And if you use @ then you can be sure that you're referring to local, and likewise get an error if it doesn't exist. As a bonus ( 🐺 in 🐑 's clothing), this allows you to use both a local and a column name with the same name.

Examples:

a = 1
df = DataFrame({'a': randn(10), 'b': randn(10)})

# uses the column 'a'
df.query('a > b')

# uses the local
df.query('@a > b')

# fails because I didn't reference the local and there's no 'c' column
c = 1
df.query('a > c')

# local and a column name
df.query('b < @a < a')

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions