Skip to content

Logical plan generation inconsistency #568

Open
@emanueledomingo

Description

@emanueledomingo

Hi Everyone,

I'm not sure if this is the right place for this. It's more a question than a real bug.

Describe the bug
I tried to generate the logical plan of a query, instead of passing the query's text in the .sql function, using substrait. The substrait compilation fails while the function executes it without any problem.

What is the reason behind this behavior?

To Reproduce

import datafusion
from datafusion.substrait import substrait as ss
import pyarrow as pa
import pyarrow.dataset as pda
from faker import Faker

print(f"DF: {datafusion.__version__}\nPA: {pa.__version__}")  # DF: 32.0.0 PA: 14.0.2

fake = Faker()

N_ROWS = 1_000

dummy_table = pa.Table.from_pydict(
    {
        "id": range(N_ROWS),
        "name": (fake.name() for _ in range(N_ROWS)),
        "country_code": (fake.country_code() for _ in range(N_ROWS)),
    }
)

q = """
SELECT
    "t1".*
    , "t2".*
FROM "table" "t1"
INNER JOIN "table" "t2"
    ON "t1"."id" = CASE WHEN "t2"."id" < 10 THEN "t2"."id" ELSE 10 END
"""

ctx = datafusion.SessionContext()
ctx.register_dataset(name="table", dataset=pda.dataset(dummy_table))

df = ctx.sql(q)
default_plan = df.logical_plan()

plan = ss.serde.serialize_to_plan(q, ctx)
logical_plan = ss.consumer.from_substrait_plan(ctx, plan)  # <- Exception here
df = ctx.create_dataframe_from_logical_plan(plan=logical_plan)
ss_plan = df.logical_plan()

Exception is:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[6], line 2
      1 plan = ss.serde.serialize_to_plan(q2, ctx)
----> 2 logical_plan = ss.consumer.from_substrait_plan(ctx, plan)
      3 df = ctx.create_dataframe_from_logical_plan(plan=logical_plan)
      4 ss_plan = df.logical_plan()

Exception: DataFusion error: Plan("invalid join condition expression")

Expected behavior

assert ss_plan == default_plan
# True

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions