Skip to content

pd.read_sql timestamptz converted to object dtype #30207

Open
@ThibTrip

Description

@ThibTrip

Code Sample, a copy-pastable example if possible

from pandas import Timestamp, NaT
import pandas as pd
from sqlalchemy import create_engine

# create test data
data = {'create_date': [Timestamp('2019-11-22 10:59:44+0000', tz='UTC'),
                        Timestamp('2019-11-21 15:27:41+0000', tz='UTC'),
                        Timestamp('2019-11-21 15:25:42+0000', tz='UTC'),
                        Timestamp('2019-11-19 14:35:52+0000', tz='UTC'),
                        Timestamp('2019-11-19 13:54:44+0000', tz='UTC'),
                        Timestamp('2019-11-14 15:12:00+0000', tz='UTC'),
                        Timestamp('2019-08-07 13:37:04+0000', tz='UTC'),
                        Timestamp('2019-12-04 14:47:42+0000', tz='UTC')],
       'change_date': [Timestamp('2019-11-22 10:59:44+0000', tz='UTC'),
                       Timestamp('2019-11-21 15:27:42+0000', tz='UTC'),
                       Timestamp('2019-11-21 15:25:43+0000', tz='UTC'),
                       Timestamp('2019-11-19 14:35:53+0000', tz='UTC'),
                       Timestamp('2019-11-19 13:54:45+0000', tz='UTC'),
                       Timestamp('2019-11-14 15:13:33+0000', tz='UTC'),
                       Timestamp('2019-08-09 13:01:13+0000', tz='UTC'),
                       Timestamp('2019-12-04 14:54:57+0000', tz='UTC')],
        'unsubscribe_date': [NaT,
                             Timestamp('2019-12-09 12:58:01+0000', tz='UTC'),
                             Timestamp('2019-12-09 12:58:28+0000', tz='UTC'),
                             NaT,
                             Timestamp('2019-12-09 12:58:24+0000', tz='UTC'),
                             Timestamp('2019-12-09 12:58:19+0000', tz='UTC'),
                             NaT,
                             NaT]}

# create DataFrame from test data 
df_test = pd.DataFrame(data)
print('test DataFrame\n', df_test, '\n')
print(df_test.dtypes, '\n')


# connect to a postgres database (set CONNECTION_STRING)
engine = create_engine('CONNECTION_STRING')


# save DataFrame to postgres
df_test.to_sql(name = 'timezone_test',
               con = engine, 
               index = False,
               if_exists = 'fail')

# make sure the data type in postgres is "timestamp with time zone"
df_db_dtypes = pd.read_sql("""SELECT column_name, data_type FROM information_schema.columns 
                              WHERE table_name = 'timezone_test'
                              AND table_schema = 'public';""", 
                           con = engine, 
                           index_col = 'column_name')
print('datatypes of test DataFrame in postgres\n', df_db_dtypes, '\n')

# read DataFrame from postgres
df_db = pd.read_sql('SELECT * FROM timezone_test', con = engine)
print('test DataFrame as read from postgres\n', df_db, '\n')
# data types should all be datetime64[ns, UTC]
print(df_db.dtypes, '\n')


# attempt to append the same data in the postgres table
try:
    df_db.to_sql(name = 'timezone_test', 
                 con = engine,
                 index = False,
                 if_exists = 'append')
except Exception as e:
    # we should get the error "Tz-aware datetime.datetime cannot be converted to datetime64\
    # unless utc=True"
    print('could not append the same data:',e,'\n')


    
# I think the problem is that pandas recognizes the data types in postgres are "timestamptz"\
# and tries to convert from "object" to datetime64 UTC dtype before saving but fails
try:
    pd.to_datetime(df_db['create_date']) # it would work with argument utc = True
except Exception as e:
    print('could not convert column "create_date" to datetime64[ns, UTC]:', e)

Output

test DataFrame
                 create_date               change_date          unsubscribe_date
0 2019-11-22 10:59:44+00:00 2019-11-22 10:59:44+00:00                       NaT
1 2019-11-21 15:27:41+00:00 2019-11-21 15:27:42+00:00 2019-12-09 12:58:01+00:00
2 2019-11-21 15:25:42+00:00 2019-11-21 15:25:43+00:00 2019-12-09 12:58:28+00:00
3 2019-11-19 14:35:52+00:00 2019-11-19 14:35:53+00:00                       NaT
4 2019-11-19 13:54:44+00:00 2019-11-19 13:54:45+00:00 2019-12-09 12:58:24+00:00
5 2019-11-14 15:12:00+00:00 2019-11-14 15:13:33+00:00 2019-12-09 12:58:19+00:00
6 2019-08-07 13:37:04+00:00 2019-08-09 13:01:13+00:00                       NaT
7 2019-12-04 14:47:42+00:00 2019-12-04 14:54:57+00:00                       NaT 

create_date         datetime64[ns, UTC]
change_date         datetime64[ns, UTC]
unsubscribe_date    datetime64[ns, UTC]
dtype: object 

datatypes of test DataFrame in postgres
                                  data_type
column_name                               
create_date       timestamp with time zone
change_date       timestamp with time zone
unsubscribe_date  timestamp with time zone 

test DataFrame as read from postgres
                  create_date                change_date          unsubscribe_date
0  2019-11-22 11:59:44+01:00  2019-11-22 11:59:44+01:00                       NaT
1  2019-11-21 16:27:41+01:00  2019-11-21 16:27:42+01:00 2019-12-09 12:58:01+00:00
2  2019-11-21 16:25:42+01:00  2019-11-21 16:25:43+01:00 2019-12-09 12:58:28+00:00
3  2019-11-19 15:35:52+01:00  2019-11-19 15:35:53+01:00                       NaT
4  2019-11-19 14:54:44+01:00  2019-11-19 14:54:45+01:00 2019-12-09 12:58:24+00:00
5  2019-11-14 16:12:00+01:00  2019-11-14 16:13:33+01:00 2019-12-09 12:58:19+00:00
6  2019-08-07 15:37:04+02:00  2019-08-09 15:01:13+02:00                       NaT
7  2019-12-04 15:47:42+01:00  2019-12-04 15:54:57+01:00                       NaT 

create_date                      object
change_date                      object
unsubscribe_date    datetime64[ns, UTC]
dtype: object 

could not append the same data: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True 

could not convert column "create_date" to datetime64[ns, UTC]: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

Problem description

There are 2 problems here:

  1. pandas reads 2 columns of the test DataFrame I save in posgres as "object" and not "datetime64[ns, UTC]" although in postgres the data types are all "timestamp with time zone"

  2. when I attempt to append the postgres table to itself via pandas (pd.DataFrame.read_sql then pd.DataFrame.to_sql) it fails when trying to convert the 2 columns "object" to datetime. So perhaps there is an issue with pd.to_datetime or I am missing something here.

I would make another issue for the second problem but I cannot reproduce it other than with this workflow.

Expected Output

  • When reading the data from the postgres table I expect those datatypes:
df_db = pd.read_sql('SELECT * FROM timezone_test', con = engine)
print(df_db.dtypes)
create_date         datetime64[ns, UTC]
change_date         datetime64[ns, UTC]
unsubscribe_date    datetime64[ns, UTC]
dtype: object 
  • Also I expect pd.DataFrame.to_sql to not throw any exception even in the case where "create_date" and "change_date" are of "object" dtypes.
df_db.to_sql(name = 'timezone_test', 
             con = engine,
             index = False,
             if_exists = 'append')

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.3
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.12
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.6.1
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.14.0
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.5
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

Metadata

Metadata

Assignees

Labels

BugIO SQLto_sql, read_sql, read_sql_queryTimezonesTimezone data dtype

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions