Description
Code Sample, a copy-pastable example if possible
# table to pivot
df = pd.DataFrame([{'uid': u'aa',
'ts': pd.Timestamp('2016-08-12 13:00:00-0700', tz='US/Pacific')},
{'uid': u'aa',
'ts': pd.Timestamp('2016-08-12 08:00:00-0700', tz='US/Pacific')},
{'uid': u'aa',
'ts': pd.Timestamp('2016-08-12 14:00:00-0700', tz='US/Pacific')},
{'uid': u'aa',
'ts': pd.Timestamp('2016-08-25 11:00:00-0700', tz='US/Pacific')},
{'uid': u'aa',
'ts': pd.Timestamp('2016-08-25 13:00:00-0700', tz='US/Pacific')}])
# not sure why I need this, but it doesn't work without it....
df = df.set_index('ts').reset_index()
# Series of day timestamps we want to pivot over
mins = df.ts.map(lambda x: x.replace(hour=0, minute=0,
second=0, microsecond=0))
# pivot
pd.pivot_table(df.set_index('ts').reset_index(),
values='ts', index=['uid'], columns=[mins], aggfunc=np.min)
# prints in UTC time and discards timezones
# do the aggregation portion of the pivot
df.groupby(mins).agg(np.min) # works as expected where timezones are maintained
Problem description
pivot_table
strips timezone information from the aggregated values.
I traced back in the code and it seems that the culprit is unstack
(
Line 138 in 4814823
Specifically, either of these numpy operations will strip the tz:
Lines 75 to 76 in 86233e1
Expected Output
Would expect aggregated values to maintain timezone like they do with groupby.
Output of pd.show_versions()
pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 32.1.0.post20161217
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None