Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import datetime
times = pd.Series([pd.Timestamp(datetime.datetime.now(), tz='US/Central') for n in range(5)])
times.dtype # datetime64[ns, US/Central] == pd.DatetimeTZDtype(tz='US/Central')
mode_result = times.mode()
mode_result.dtype # dtype('<M8[ns]') == np.dtype('datetime64[ns]')
Problem description
The Timestamps are normalized to (I think) UTC time, and the original timezone and offset are lost. Series.median()
does preserve timezones, so I assumed this is just an unhandled edge case. I'm working around this using scipy.stats.mode
, which does preserve timezones.
Expected Output
times.mode().dtype == pd.DatetimeTZDtype(tz='US/Central')
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 2cb9652
python : 3.8.7.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 20.3.4
setuptools : 49.2.1
Cython : 0.29.14
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.24.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 2.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.1
sqlalchemy : 1.3.23
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
numba : None