Description
Code Sample, a copy-pastable example if possible
from pandas import Timestamp, NaT
import pandas as pd
from sqlalchemy import create_engine
# create test data
data = {'create_date': [Timestamp('2019-11-22 10:59:44+0000', tz='UTC'),
Timestamp('2019-11-21 15:27:41+0000', tz='UTC'),
Timestamp('2019-11-21 15:25:42+0000', tz='UTC'),
Timestamp('2019-11-19 14:35:52+0000', tz='UTC'),
Timestamp('2019-11-19 13:54:44+0000', tz='UTC'),
Timestamp('2019-11-14 15:12:00+0000', tz='UTC'),
Timestamp('2019-08-07 13:37:04+0000', tz='UTC'),
Timestamp('2019-12-04 14:47:42+0000', tz='UTC')],
'change_date': [Timestamp('2019-11-22 10:59:44+0000', tz='UTC'),
Timestamp('2019-11-21 15:27:42+0000', tz='UTC'),
Timestamp('2019-11-21 15:25:43+0000', tz='UTC'),
Timestamp('2019-11-19 14:35:53+0000', tz='UTC'),
Timestamp('2019-11-19 13:54:45+0000', tz='UTC'),
Timestamp('2019-11-14 15:13:33+0000', tz='UTC'),
Timestamp('2019-08-09 13:01:13+0000', tz='UTC'),
Timestamp('2019-12-04 14:54:57+0000', tz='UTC')],
'unsubscribe_date': [NaT,
Timestamp('2019-12-09 12:58:01+0000', tz='UTC'),
Timestamp('2019-12-09 12:58:28+0000', tz='UTC'),
NaT,
Timestamp('2019-12-09 12:58:24+0000', tz='UTC'),
Timestamp('2019-12-09 12:58:19+0000', tz='UTC'),
NaT,
NaT]}
# create DataFrame from test data
df_test = pd.DataFrame(data)
print('test DataFrame\n', df_test, '\n')
print(df_test.dtypes, '\n')
# connect to a postgres database (set CONNECTION_STRING)
engine = create_engine('CONNECTION_STRING')
# save DataFrame to postgres
df_test.to_sql(name = 'timezone_test',
con = engine,
index = False,
if_exists = 'fail')
# make sure the data type in postgres is "timestamp with time zone"
df_db_dtypes = pd.read_sql("""SELECT column_name, data_type FROM information_schema.columns
WHERE table_name = 'timezone_test'
AND table_schema = 'public';""",
con = engine,
index_col = 'column_name')
print('datatypes of test DataFrame in postgres\n', df_db_dtypes, '\n')
# read DataFrame from postgres
df_db = pd.read_sql('SELECT * FROM timezone_test', con = engine)
print('test DataFrame as read from postgres\n', df_db, '\n')
# data types should all be datetime64[ns, UTC]
print(df_db.dtypes, '\n')
# attempt to append the same data in the postgres table
try:
df_db.to_sql(name = 'timezone_test',
con = engine,
index = False,
if_exists = 'append')
except Exception as e:
# we should get the error "Tz-aware datetime.datetime cannot be converted to datetime64\
# unless utc=True"
print('could not append the same data:',e,'\n')
# I think the problem is that pandas recognizes the data types in postgres are "timestamptz"\
# and tries to convert from "object" to datetime64 UTC dtype before saving but fails
try:
pd.to_datetime(df_db['create_date']) # it would work with argument utc = True
except Exception as e:
print('could not convert column "create_date" to datetime64[ns, UTC]:', e)
Output
test DataFrame
create_date change_date unsubscribe_date
0 2019-11-22 10:59:44+00:00 2019-11-22 10:59:44+00:00 NaT
1 2019-11-21 15:27:41+00:00 2019-11-21 15:27:42+00:00 2019-12-09 12:58:01+00:00
2 2019-11-21 15:25:42+00:00 2019-11-21 15:25:43+00:00 2019-12-09 12:58:28+00:00
3 2019-11-19 14:35:52+00:00 2019-11-19 14:35:53+00:00 NaT
4 2019-11-19 13:54:44+00:00 2019-11-19 13:54:45+00:00 2019-12-09 12:58:24+00:00
5 2019-11-14 15:12:00+00:00 2019-11-14 15:13:33+00:00 2019-12-09 12:58:19+00:00
6 2019-08-07 13:37:04+00:00 2019-08-09 13:01:13+00:00 NaT
7 2019-12-04 14:47:42+00:00 2019-12-04 14:54:57+00:00 NaT
create_date datetime64[ns, UTC]
change_date datetime64[ns, UTC]
unsubscribe_date datetime64[ns, UTC]
dtype: object
datatypes of test DataFrame in postgres
data_type
column_name
create_date timestamp with time zone
change_date timestamp with time zone
unsubscribe_date timestamp with time zone
test DataFrame as read from postgres
create_date change_date unsubscribe_date
0 2019-11-22 11:59:44+01:00 2019-11-22 11:59:44+01:00 NaT
1 2019-11-21 16:27:41+01:00 2019-11-21 16:27:42+01:00 2019-12-09 12:58:01+00:00
2 2019-11-21 16:25:42+01:00 2019-11-21 16:25:43+01:00 2019-12-09 12:58:28+00:00
3 2019-11-19 15:35:52+01:00 2019-11-19 15:35:53+01:00 NaT
4 2019-11-19 14:54:44+01:00 2019-11-19 14:54:45+01:00 2019-12-09 12:58:24+00:00
5 2019-11-14 16:12:00+01:00 2019-11-14 16:13:33+01:00 2019-12-09 12:58:19+00:00
6 2019-08-07 15:37:04+02:00 2019-08-09 15:01:13+02:00 NaT
7 2019-12-04 15:47:42+01:00 2019-12-04 15:54:57+01:00 NaT
create_date object
change_date object
unsubscribe_date datetime64[ns, UTC]
dtype: object
could not append the same data: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
could not convert column "create_date" to datetime64[ns, UTC]: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
Problem description
There are 2 problems here:
-
pandas reads 2 columns of the test DataFrame I save in posgres as "object" and not "datetime64[ns, UTC]" although in postgres the data types are all "timestamp with time zone"
-
when I attempt to append the postgres table to itself via pandas (pd.DataFrame.read_sql then pd.DataFrame.to_sql) it fails when trying to convert the 2 columns "object" to datetime. So perhaps there is an issue with pd.to_datetime or I am missing something here.
I would make another issue for the second problem but I cannot reproduce it other than with this workflow.
Expected Output
- When reading the data from the postgres table I expect those datatypes:
df_db = pd.read_sql('SELECT * FROM timezone_test', con = engine)
print(df_db.dtypes)
create_date datetime64[ns, UTC]
change_date datetime64[ns, UTC]
unsubscribe_date datetime64[ns, UTC]
dtype: object
- Also I expect pd.DataFrame.to_sql to not throw any exception even in the case where "create_date" and "change_date" are of "object" dtypes.
df_db.to_sql(name = 'timezone_test',
con = engine,
index = False,
if_exists = 'append')
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.3
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.12
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.6.1
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.14.0
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.5
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8