Description
Problem
In pd.read_sql and pd.read_sql_table when the chunksize parameter is set, Pandas builds a DataFrame with dtypes inferred from the data in the chunk. This can be a problem if an INTEGER colum contains null values in some chunks but not in others, leading the same column to be int64 in some cases and in others float64. A similar problem happens with strings.
In ETL processes or simply when dumping large queries to disk in HDF5 format, the user currently has the burden of explicitly having to handle the type conversions of potentially many columns.
Solution?
Instead of guessing the type from a subset of the data, it should be possible to obtain the type information from the database and map it to the appropriate dtypes.
It is possible to obtain column information from Sqlalchemy when querying a full table by inspecting its metadata, but I was unsuccessfull in findind a way to do it for a general query.
Although I am unaware of all the possible type problems that can arise DBAPI does actually enforce the cursor.description to specify whether each result column is nullable.
Pandas could use this information (optionally) to always interpret nullable numeric columns as floats and strings as object columns.