Description
Currently I believe that the read_sql functions pull data from databases with sqlalchemy, which can be somewhat slow. The Turbodbc library seems to provide an alternate solution that is still ODBC compatible that might operate at higher speed. We might consider adding an engine=
keyword to the read_sql
functions to allow them to use alternate libraries like this. In this case I would hope that we could pull from the database into Arrow memory and from there to Pandas more efficiently.
The documentation for turbodbc already shows how to do this using their API. There might be some value to integrating this into the Pandas API directly. From my perspective as a Dask developer I would like to use Turbodbc but would prefer that Pandas did the actual wrapping.
I spoke with @jreback about this in person. @MathMagique @xhochy @wesm may also be interested. My apologies if this has already been discussed elsewhere (I was surprised that I couldn't find anything).
http://turbodbc.readthedocs.io/en/latest/pages/advanced_usage.html#apache-arrow-support
https://arrow.apache.org/blog/2017/06/16/turbodbc-arrow/