Description
When aggregating a column with dtype datetime64[ns, timezone], pandas does not handle properly tz information. It looks like as if pandas was dropping the timezone info (ie doing its calculation on the UTC values) and then localizing the date to the original timezone,i.e. doing something like
column_with_UTC_dates_in_naive_format.tz_localize(timezone)
instead of
column_with_UTC_dates_in_naive_format.tz_localize("UTC").tz_convert(timezone)
This buggy behavior is even more striking when running the sample with
idx = pandas.date_range("2016-03-27", "2016-03-29", freq="H", closed="left", tz="Europe/Brussels")
(cover the DST change) as the naive UTC date "2016-03-27 03:00" cannot be localized to the timezone as it is not recognized. In this case, the dtype of the column is not anymore datetime64[ns, timezone] but just int64.
Code Sample, a copy-pastable example if possible
# create a DataFrame with a "time" column filled with localized datetime64 on hourly basis
idx = pandas.date_range("2016-01-01", "2016-01-03", freq="H", closed="left",tz="Europe/Brussels")
df = pandas.Series(idx, index=idx, name="time").to_frame()
# calculate some min and max of "time" column per hour of the day
df_agg = df.groupby(idx.hour).aggregate(["min","max"])
df_agg.info()
print df_agg
Expected Output
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 48 entries, 2016-01-01 00:00:00+01:00 to 2016-01-02 23:00:00+01:00
Freq: H
Data columns (total 1 columns):
time 48 non-null datetime64[ns, Europe/Brussels]
dtypes: datetime64[ns, Europe/Brussels](1)
memory usage: 768.0 bytes
<class 'pandas.core.frame.DataFrame'>
Int64Index: 24 entries, 0 to 23
Data columns (total 2 columns):
(time, min) 24 non-null datetime64[ns, Europe/Brussels]
(time, max) 24 non-null datetime64[ns, Europe/Brussels]
dtypes: datetime64[ns, Europe/Brussels](2)
memory usage: 576.0 bytes
time
min max
0 2015-12-31 23:00:00+01:00 2016-01-01 23:00:00+01:00
1 2016-01-01 00:00:00+01:00 2016-01-02 00:00:00+01:00
2 2016-01-01 01:00:00+01:00 2016-01-02 01:00:00+01:00
3 2016-01-01 02:00:00+01:00 2016-01-02 02:00:00+01:00
4 2016-01-01 03:00:00+01:00 2016-01-02 03:00:00+01:00
5 2016-01-01 04:00:00+01:00 2016-01-02 04:00:00+01:00
6 2016-01-01 05:00:00+01:00 2016-01-02 05:00:00+01:00
7 2016-01-01 06:00:00+01:00 2016-01-02 06:00:00+01:00
8 2016-01-01 07:00:00+01:00 2016-01-02 07:00:00+01:00
9 2016-01-01 08:00:00+01:00 2016-01-02 08:00:00+01:00
10 2016-01-01 09:00:00+01:00 2016-01-02 09:00:00+01:00
11 2016-01-01 10:00:00+01:00 2016-01-02 10:00:00+01:00
12 2016-01-01 11:00:00+01:00 2016-01-02 11:00:00+01:00
13 2016-01-01 12:00:00+01:00 2016-01-02 12:00:00+01:00
14 2016-01-01 13:00:00+01:00 2016-01-02 13:00:00+01:00
15 2016-01-01 14:00:00+01:00 2016-01-02 14:00:00+01:00
16 2016-01-01 15:00:00+01:00 2016-01-02 15:00:00+01:00
17 2016-01-01 16:00:00+01:00 2016-01-02 16:00:00+01:00
18 2016-01-01 17:00:00+01:00 2016-01-02 17:00:00+01:00
19 2016-01-01 18:00:00+01:00 2016-01-02 18:00:00+01:00
20 2016-01-01 19:00:00+01:00 2016-01-02 19:00:00+01:00
21 2016-01-01 20:00:00+01:00 2016-01-02 20:00:00+01:00
22 2016-01-01 21:00:00+01:00 2016-01-02 21:00:00+01:00
23 2016-01-01 22:00:00+01:00 2016-01-02 22:00:00+01:00
output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.11.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.6.7
Cython: None
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: None
sphinx: None
patsy: 0.4.0
dateutil: 2.5.0
pytz: 2016.1
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
None