Skip to content

BUG: tz-aware DatetimeIndex + array(timedelta) gives incorrect result #17558

Closed
@azjps

Description

@azjps

Code Sample, a copy-pastable example if possible

In [1]: import numpy as np, pandas as pd

In [2]: dt = pd.DatetimeIndex([pd.Timestamp("2017/01/01")], tz='US/Eastern')  # dtype=datetime64[ns, US/Eastern]

In [3]: td_np = np.array([np.timedelta64(1, 'ns')])  # dtype="timedelta64[ns]"

In [4]: (dt + td_np).values  # Bad, "applies" TZ twice to get 10:00 instead of 05:00!
Out[4]: array(['2017-01-01T10:00:00.000000001'], dtype='datetime64[ns]')

In [5]: (dt + td_np[0]).values
Out[5]: array(['2017-01-01T05:00:00.000000001'], dtype='datetime64[ns]')

In [6]: (dt + pd.Series(td_np)).values
Out[6]: array(['2017-01-01T05:00:00.000000001'], dtype='datetime64[ns]')

In [7]: (dt.tz_convert(None) + td_np).values
Out[7]: array(['2017-01-01T05:00:00.000000001'], dtype='datetime64[ns]')

In [8]: (dt.values + td_np)
Out[8]: array(['2017-01-01T05:00:00.000000001'], dtype='datetime64[ns]')

In [9]: (dt + pd.TimedeltaIndex(td_np))
TypeError: data type not understood

Problem description

(Above is run on 0.20.3) When adding a tz-aware DatetimeIndex and a numpy array of timedeltas, the result incorrectly has the timezone applied twice. There are several related issues on tz-aware DatetimeIndex and timedelta sums on the issue tracker, for example #14022. However, as far as I've read, those issues appear to have been mostly resolved as of ~0.19; this case just slipped through the cracks. I haven't found any other variants of this bug, except maybe that tz-aware DatetimeIndex + TimedeltaIndex raises an exception.

(Just for context, I am not relying on this behavior, just happened to come across it while updating some tests to use pandas 0.20.3, but it could lead to someone silently getting incorrect results.)

I think all that needs to be fixed here is to add another case in pd.DatetimeIndex[OpsMixin].__add__ to handle any array-like with dtype=timedelta64[..]. (Maybe there's a more general upstream solution?) If someone confirms I can submit a patch.

Expected Output

In [4]: (dt + td_np).values
Out[5]: array(['2017-01-01T05:00:00.000000001'], dtype='datetime64[ns]')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.6.final.0 python-bits: 64 OS: Linux OS-release: 4.4.75-el6.x86_64.lime.1 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.20.3
pytest: 2.9.1
pip: 7.1.0
setuptools: 19.4
Cython: 0.24
numpy: 1.13.1
scipy: 0.17.0
xarray: None
IPython: 5.1.0
sphinx: 1.4.5
patsy: 0.4.1
dateutil: 2.2
pytz: 2015.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 1.5.3
openpyxl: 1.8.6
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: 4.4.0
html5lib: None
sqlalchemy: 1.0.12
pymysql: 0.6.3.None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions