Skip to content

SQL: read_sql functions should not inspect all tables of database #7396

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Related: comment of @balancap: #6416 (comment) and issue #7380 where you get a warning about a not understood type (probably in another table).

Situation now: in the creation of a PandasSQLAlchemy object (https://github.com/pydata/pandas/blob/v0.14.0/pandas/io/sql.py#L777), a full MetaData object is created and reflected (this means: all tables in the database are inspected and the schema's are stored as sqlalchemy Table objects). This is done each time read_sql/read_sql_query/read_sql_table is called.

Consequence:

  • this can be costly when having a very large database or having a distant server (ENH: SQL through SQLAlchemy - performance #6416 (comment)).
  • this can trigger warnings that does not have to do anything with your current query, or the current table you want to read, when eg one of the types in other tables is not known to sqlalchemy (eg a postgis geometry column).

Possible solution:

  • I think the read_sql functions never should inspect all tables, but only the specified table (and read_sql_query even not that table, as with a query this information is not used, only for read_sql_table)
  • This can maybe be achieved with using the only keyword in meta.reflect(engine, only=...)
  • For the OO API interface, we can discuss what should be the default (inspect all tables or not)

@mangecoeur @danielballan @hayd

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO SQLto_sql, read_sql, read_sql_query

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions