Description
ask
related to: dbt-labs/dbt-core#39 dbt-labs/dbt-core#46 dbt-labs/dbt-core#82
allow for dbt (and other consumers/users) to make use of this driver in a simple query flow manner. two specific places that introduce friction are:
- strings with multiple,
;
-delimited SQL statements cannot be sent to `cursor.execute() - by default the driver today interprets every SQL operation to be a prepared statement even when it isn't.
I love @Brooke-white 's summary in #39 (comment). I've been looking for this terminology all week!
yes I agree it'd be much cleaner (and simpler) to support this in a single call to execute.
I believepsycopg
supports this due to the approach taken to executing the statement. There are two approaches a driver can take to execute a statement -- simple query cycle and extended query cycle. Simple query cycle supports executing multiple statements, while extended query protocol does not.redshift_connector
uses extended query protocol, which is the root cause for our lack of support. If you're interested, the above links give an interesting overview of the two approaches and their trade offs.
When I have extra time I am working to add support for simple query cycle (i.e. multiple statement execution), but there's no timeline on when this would be available as we are prioritizing other tasks at this time. Ideally, I'd like to provide support for both extended and simple query cycles so users have the option to fine tune how statements are executed :)
background
dbt is a framework that effectively takes SELECT
statements, and wraps them within boilerplate DDL to create a DAG of tables and views. Only a somewhat auxillary feature, seeds, make use of prepared statements with an INSERT INTO ... VALUES
statement.
In some ways, dbt is a stand in for the parsing and preparation that a database might do in an extended query flow pattern. Using both both complicates things greatly.
recently, in dbt-labs/dbt-redshift#251, dbt-redshift migrated away from psycopg2
to redshift_connector
in order to enable
- IAM auth to redshift serverless, and
- the redshift team to more proactively accommodate and fix issues upstream.
however this change has introduced a number of regressions for our users. While we have both:
- Restore transaction semantics used by dbt-redshift prior to 1.5 dbt-labs/dbt-redshift#475 a workaround that forces
autocommit=True
for all connections, as well as - [CT-2624] [Spike] Scrutinize and clarify our approach and interface for transactions dbt-labs/dbt-adapters#347 a backlog item to revisit query execution and transactions more generally across all our supported data platforms
example
cursor = redshift_connector.connect(**vals).cursor()
cursor.execute("select 1 as id; select 2 as id")
records_rs = cursor.fetchall()
Error message (full traceback)
ProgrammingError: {'S': 'ERROR', 'C': '42601', 'M':
'cannot insert multiple commands into a prepared statement',
'F': '../src/pg/src/backend/tcop/postgres.c','L': '3916', 'R': 'exec_parse_message'}
counter-counter-argument
@Brooke-white I saw you mention somewhere that the spec for .execute() in the PEP 249 – Python Database API Specification v2.0
lays out that the input may only be a single operation
and that .executemany()
should be used for multi-operation/-statement queries. However, if I'm reading this right, the purpose of .executemany()
is to execute the the operation over a list of parameter sets. The first argument is still a singular operation
.
sidebar
I'm very unclear on exact meanings of the following terms here: query
, statement
, operation
further complications
all of this confusion is making it even harder to determine how transactions are handled by default between the simple and extended query flows