Skip to content

Inconsistent parsing for timestamp with timezone with read_sql_query #15119

Closed
@thrasibule

Description

@thrasibule

Code Sample, a copy-pastable example if possible

I'm creating a test table in postgresql as follows:

CREATE TABLE test(date timestamptz);

INSERT INTO test VALUES('2014-10-16 07:50:49-04'),('2014-11-06 07:42:06-05')

Notice that the dates are across the dst date, so that they have different offsets to utc.

df1 = pd.read_sql_query("SELECT date FROM test", engine,
                        index_col='date')

df2 = pd.read_sql_query("SELECT date FROM test WHERE date<='2014-10-17'",
                        engine,
                        index_col='date')

df2bis = pd.read_sql_query("SELECT date FROM test WHERE date<='2014-10-17'",
                           engine,
                           index_col='date',
                           parse_dates=['date'])
df3 = pd.read_sql_query("SELECT date FROM test", engine,
                        index_col='date',
                        parse_dates=['date'])

Problem description

In [159]: df1.index
Out[159]: 
Index([2014-10-16 07:50:49-04:00, 2014-11-06 07:42:06-05:00], dtype='object', name='date')

In [160]: df2.index
Out[160]: 
DatetimeIndex(['2014-10-16 11:50:49+00:00'], dtype='datetime64[ns, UTC]', name='date', freq=None)

In [162]: df2bis.index
Out[162]: 
DatetimeIndex(['2014-10-16 11:50:49+00:00'], dtype='datetime64[ns, UTC]', name='date', freq=None)

In [163]: df3.index
Out[163]: 
DatetimeIndex(['2014-10-16 11:50:49', '2014-11-06 12:42:06'], dtype='datetime64[ns]', name='date', freq=None)

Expected Output

The current behavior if I understand correctly is that if there is only one timezone offset: timestamps are parsed automatically to timezone aware timestamps.
If dates are across different offsets, they are not parsed automatically. We can force parsing using parse_dates, but then they are not timezone aware. I can make the conversion using tz_localize, but it's not very consistent. I would expect to have everything parsed automatically to timezone aware timestamps.

This is related to issue #7364. If there is a consensus on what should be done, I can try to make the required changes to make it more regular.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Linux OS-release: 4.8.13-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.utf8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 33.0.0
Cython: 0.25.2
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 2.0.0rc2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: 0.9.2
apiclient: 1.5.5
sqlalchemy: 1.1.4
pymysql: 0.7.9.None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.4
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO SQLto_sql, read_sql, read_sql_queryTimezonesTimezone data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions