Use Turbodbc/Arrow for read_sql_table

Currently I believe that the read_sql functions pull data from databases with sqlalchemy, which can be somewhat slow.  The Turbodbc library seems to provide an alternate solution that is still ODBC compatible that might operate at higher speed.  We might consider adding an `engine=` keyword to the `read_sql` functions to allow them to use alternate libraries like this.  In this case I would hope that we could pull from the database into Arrow memory and from there to Pandas more efficiently.

The documentation for turbodbc already shows how to do this using their API.  There might be some value to integrating this into the Pandas API directly.  From my perspective as a Dask developer I would like to use Turbodbc but would prefer that Pandas did the actual wrapping.

I spoke with @jreback about this in person.  @MathMagique @xhochy @wesm may also be interested.  My apologies if this has already been discussed elsewhere (I was surprised that I couldn't find anything).

http://turbodbc.readthedocs.io/en/latest/pages/advanced_usage.html#apache-arrow-support
https://arrow.apache.org/blog/2017/06/16/turbodbc-arrow/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use Turbodbc/Arrow for read_sql_table #17790

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Use Turbodbc/Arrow for read_sql_table #17790

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions