Description
Code Sample, a copy-pastable example if possible
I'm creating a test table in postgresql as follows:
CREATE TABLE test(date timestamptz);
INSERT INTO test VALUES('2014-10-16 07:50:49-04'),('2014-11-06 07:42:06-05')
Notice that the dates are across the dst date, so that they have different offsets to utc.
df1 = pd.read_sql_query("SELECT date FROM test", engine,
index_col='date')
df2 = pd.read_sql_query("SELECT date FROM test WHERE date<='2014-10-17'",
engine,
index_col='date')
df2bis = pd.read_sql_query("SELECT date FROM test WHERE date<='2014-10-17'",
engine,
index_col='date',
parse_dates=['date'])
df3 = pd.read_sql_query("SELECT date FROM test", engine,
index_col='date',
parse_dates=['date'])
Problem description
In [159]: df1.index
Out[159]:
Index([2014-10-16 07:50:49-04:00, 2014-11-06 07:42:06-05:00], dtype='object', name='date')
In [160]: df2.index
Out[160]:
DatetimeIndex(['2014-10-16 11:50:49+00:00'], dtype='datetime64[ns, UTC]', name='date', freq=None)
In [162]: df2bis.index
Out[162]:
DatetimeIndex(['2014-10-16 11:50:49+00:00'], dtype='datetime64[ns, UTC]', name='date', freq=None)
In [163]: df3.index
Out[163]:
DatetimeIndex(['2014-10-16 11:50:49', '2014-11-06 12:42:06'], dtype='datetime64[ns]', name='date', freq=None)
Expected Output
The current behavior if I understand correctly is that if there is only one timezone offset: timestamps are parsed automatically to timezone aware timestamps.
If dates are across different offsets, they are not parsed automatically. We can force parsing using parse_dates, but then they are not timezone aware. I can make the conversion using tz_localize, but it's not very consistent. I would expect to have everything parsed automatically to timezone aware timestamps.
This is related to issue #7364. If there is a consensus on what should be done, I can try to make the required changes to make it more regular.
Output of pd.show_versions()
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 33.0.0
Cython: 0.25.2
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 2.0.0rc2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: 0.9.2
apiclient: 1.5.5
sqlalchemy: 1.1.4
pymysql: 0.7.9.None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.4
boto: None
pandas_datareader: None