Skip to content

BUG: fix writing of Categorical with to_sql (GH8624) #8682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -573,6 +573,7 @@ relevant columns back to `category` and assign the right categories and categori
df2.dtypes
df2["cats"]

The same holds for writing to a SQL database with ``to_sql``.

Missing Data
------------
Expand Down
8 changes: 8 additions & 0 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3337,6 +3337,14 @@ With some databases, writing large DataFrames can result in errors due to packet
flavors, columns with type ``timedelta64`` will be written as integer
values as nanoseconds to the database and a warning will be raised.

.. note::

Columns of ``category`` dtype will be converted to the dense representation
as you would get with ``np.asarray(categorical)`` (e.g. for string categories
this gives an array of strings).
Because of this, reading the database table back in does **not** generate
a categorical.


Reading Tables
~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.15.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ Bug Fixes
- Bug in ``Categorical`` not created properly with ``Series.to_frame()`` (:issue:`8626`)
- Bug in coercing in astype of a ``Categorical`` of a passed ``pd.Categorical`` (this now raises ``TypeError`` correctly), (:issue:`8626`)
- Bug in ``cut``/``qcut`` when using ``Series`` and ``retbins=True`` (:issue:`8589`)

- Bug in writing Categorical columns to an SQL database with ``to_sql`` (:issue:`8624`).



Expand Down
2 changes: 1 addition & 1 deletion pandas/io/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -670,7 +670,7 @@ def insert_data(self):
# datetime.datetime
d = b.values.astype('M8[us]').astype(object)
else:
d = np.array(b.values, dtype=object)
d = np.array(b.get_values(), dtype=object)

# replace NaN with None
if b._can_hold_na:
Expand Down
14 changes: 14 additions & 0 deletions pandas/io/tests/test_sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -678,6 +678,20 @@ def test_chunksize_read(self):

tm.assert_frame_equal(res1, res3)

def test_categorical(self):
# GH8624
# test that categorical gets written correctly as dense column
df = DataFrame(
{'person_id': [1, 2, 3],
'person_name': ['John P. Doe', 'Jane Dove', 'John P. Doe']})
df2 = df.copy()
df2['person_name'] = df2['person_name'].astype('category')

df2.to_sql('test_categorical', self.conn, index=False)
res = sql.read_sql_query('SELECT * FROM test_categorical', self.conn)

tm.assert_frame_equal(res, df)


class TestSQLApi(_TestSQLApi):
"""
Expand Down