Skip to content

ENH: Map pandas integer to optimal SQLAlchemy integer type (GH35076) #38548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 24, 2020
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Other enhancements
- Improve error message when ``usecols`` and ``names`` do not match for :func:`read_csv` and ``engine="c"`` (:issue:`29042`)
- Improved consistency of error message when passing an invalid ``win_type`` argument in :class:`Window` (:issue:`15969`)
- :func:`pandas.read_sql_query` now accepts a ``dtype`` argument to cast the columnar data from the SQL database based on user input (:issue:`10285`)
- Improved integer type mapping from pandas to SQLAlchemy when using :meth:`DataFrame.to_sql` (:issue:`35076`)

.. ---------------------------------------------------------------------------

Expand Down
8 changes: 7 additions & 1 deletion pandas/io/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -1124,6 +1124,7 @@ def _sqlalchemy_type(self, col):
DateTime,
Float,
Integer,
SmallInteger,
Text,
Time,
)
Expand Down Expand Up @@ -1154,8 +1155,13 @@ def _sqlalchemy_type(self, col):
else:
return Float(precision=53)
elif col_type == "integer":
if col.dtype == "int32":
# GH35076 Map pandas integer to optimal SQLAlchemy integer type
if col.dtype.name.lower() in ("int8", "uint8", "int16"):
return SmallInteger
elif col.dtype.name.lower() in ("uint16", "int32"):
return Integer
elif col.dtype.name.lower() == "uint64":
raise ValueError("Unsigned 64 bit integer datatype is not supported")
else:
return BigInteger
elif col_type == "boolean":
Expand Down
39 changes: 39 additions & 0 deletions pandas/tests/io/test_sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -1160,6 +1160,45 @@ def test_sqlalchemy_type_mapping(self):
# GH 9086: TIMESTAMP is the suggested type for datetimes with timezones
assert isinstance(table.table.c["time"].type, sqltypes.TIMESTAMP)

@pytest.mark.parametrize(
"integer, expected",
[
("int8", "SMALLINT"),
("Int8", "SMALLINT"),
("uint8", "SMALLINT"),
("UInt8", "SMALLINT"),
("int16", "SMALLINT"),
("Int16", "SMALLINT"),
("uint16", "INTEGER"),
("UInt16", "INTEGER"),
("int32", "INTEGER"),
("Int32", "INTEGER"),
("uint32", "BIGINT"),
("UInt32", "BIGINT"),
("int64", "BIGINT"),
("Int64", "BIGINT"),
(int, "BIGINT" if np.dtype(int).name == "int64" else "INTEGER"),
],
)
def test_sqlalchemy_integer_mapping(self, integer, expected):
# GH35076 Map pandas integer to optimal SQLAlchemy integer type
df = DataFrame([0, 1], columns=["a"], dtype=integer)
db = sql.SQLDatabase(self.conn)
table = sql.SQLTable("test_type", db, frame=df)

result = str(table.table.c.a.type)
assert result == expected

@pytest.mark.parametrize("integer", ["uint64", "UInt64"])
def test_sqlalchemy_integer_overload_mapping(self, integer):
# GH35076 Map pandas integer to optimal SQLAlchemy integer type
df = DataFrame([0, 1], columns=["a"], dtype=integer)
db = sql.SQLDatabase(self.conn)
with pytest.raises(
ValueError, match="Unsigned 64 bit integer datatype is not supported"
):
sql.SQLTable("test_type", db, frame=df)

def test_database_uri_string(self):

# Test read_sql and .to_sql method with a database URI (GH10654)
Expand Down