Skip to content

QST: to_sql is unnecessarily blocked because of dependency on information_schema tables #52601

Open
@match-gabeflores

Description

@match-gabeflores

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/64610269/sqlalchemy-hangs-during-insert-while-querying-information-schema-tables

Question about pandas

I'd like to bring attention to this issue where to_sql is being blocked by an unrelated process because pandas requires checking information_schema metadata tables.

My example is MSSQL (SQL Server) specific but I think it could apply to other DB systems.

steps to reproduce:

  • run python code first (automatically creates table)
  • run create table (new unrelated table) in an open transaction
    • key point is that this is unrelated to the table you are inserting in pandas
  • re-run python code to insert 10 records. this gets blocked bc it can't query information_schema

Other scenarios that lock information_schema cause this blocking in pandas (rebuilding clustered columnstore index on large table)

Notes

In my opinion, we should be able to insert into an existing table no matter if there are other processes going on.

Root cause

**SQL code to create unrelated table **


begin tran 
-- in same database 
create table dbo.MyOtherTableUnrelatedProcess
(
	a int
)
--rollback

code i used for testing the python piece

import pandas as pd
import sqlalchemy as sa
import urllib

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Henry', 'Isabel', 'Jack']
})

host = 'myhost'
schema = 'mydb'

params = urllib.parse.quote_plus("DRIVER={ODBC Driver 17 for SQL Server};"
                                 "SERVER=" + host + ";"
                                "DATABASE=" + schema + ";"
                               "trusted_connection=yes")

engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params), fast_executemany=True)

print('writing')
# note: MyMainTable is blocked because of the open transaction on unrelated process. to_sql can't querying information_schema
df.to_sql('MyMainTable', con=engine, if_exists='append', index=False)  
print('finished')

checking the server, this is the hung query called by pandas/sqlalchemy

SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] 
FROM [INFORMATION_SCHEMA].[TABLES] 
WHERE [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(@P1 AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(@P2 AS NVARCHAR(max)) ORDER BY [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]

Pandas is blocked in loading into the table
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO SQLto_sql, read_sql, read_sql_query

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions