Skip to content

support multi-statement .execute() calls #162

Open
@dataders

Description

@dataders

ask

related to: dbt-labs/dbt-core#39 dbt-labs/dbt-core#46 dbt-labs/dbt-core#82

allow for dbt (and other consumers/users) to make use of this driver in a simple query flow manner. two specific places that introduce friction are:

  1. strings with multiple, ;-delimited SQL statements cannot be sent to `cursor.execute()
  2. by default the driver today interprets every SQL operation to be a prepared statement even when it isn't.

I love @Brooke-white 's summary in #39 (comment). I've been looking for this terminology all week!

yes I agree it'd be much cleaner (and simpler) to support this in a single call to execute.
I believe psycopg supports this due to the approach taken to executing the statement. There are two approaches a driver can take to execute a statement -- simple query cycle and extended query cycle. Simple query cycle supports executing multiple statements, while extended query protocol does not. redshift_connector uses extended query protocol, which is the root cause for our lack of support. If you're interested, the above links give an interesting overview of the two approaches and their trade offs.
When I have extra time I am working to add support for simple query cycle (i.e. multiple statement execution), but there's no timeline on when this would be available as we are prioritizing other tasks at this time. Ideally, I'd like to provide support for both extended and simple query cycles so users have the option to fine tune how statements are executed :)

background

dbt is a framework that effectively takes SELECT statements, and wraps them within boilerplate DDL to create a DAG of tables and views. Only a somewhat auxillary feature, seeds, make use of prepared statements with an INSERT INTO ... VALUES statement.

In some ways, dbt is a stand in for the parsing and preparation that a database might do in an extended query flow pattern. Using both both complicates things greatly.

recently, in dbt-labs/dbt-redshift#251, dbt-redshift migrated away from psycopg2 to redshift_connector in order to enable

  1. IAM auth to redshift serverless, and
  2. the redshift team to more proactively accommodate and fix issues upstream.

however this change has introduced a number of regressions for our users. While we have both:

example

cursor = redshift_connector.connect(**vals).cursor()
cursor.execute("select 1 as id; select 2 as id")
records_rs = cursor.fetchall()

Error message (full traceback)

ProgrammingError: {'S': 'ERROR', 'C': '42601', 'M':
    'cannot insert multiple commands into a prepared statement',
'F': '../src/pg/src/backend/tcop/postgres.c','L': '3916', 'R': 'exec_parse_message'}

counter-counter-argument

@Brooke-white I saw you mention somewhere that the spec for .execute() in the PEP 249 – Python Database API Specification v2.0
lays out that the input may only be a single operation and that .executemany() should be used for multi-operation/-statement queries. However, if I'm reading this right, the purpose of .executemany() is to execute the the operation over a list of parameter sets. The first argument is still a singular operation.

sidebar

I'm very unclear on exact meanings of the following terms here: query, statement, operation

further complications

all of this confusion is making it even harder to determine how transactions are handled by default between the simple and extended query flows

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions