Skip to content

ENH: Add additional options to nonexistent in tz_localize #24493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 3, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2351,9 +2351,11 @@ A DST transition may also shift the local time ahead by 1 hour creating nonexist
local times. The behavior of localizing a timeseries with nonexistent times
can be controlled by the ``nonexistent`` argument. The following options are available:

* ``raise``: Raises a ``pytz.NonExistentTimeError`` (the default behavior)
* ``NaT``: Replaces nonexistent times with ``NaT``
* ``shift``: Shifts nonexistent times forward to the closest real time
* ``'raise'``: Raises a ``pytz.NonExistentTimeError`` (the default behavior)
* ``'NaT'``: Replaces nonexistent times with ``NaT``
* ``'shift_forward'``: Shifts nonexistent times forward to the closest real time
* ``'shift_backward'``: Shifts nonexistent times backward to the closest real time
* timedelta object: Shifts nonexistent times by the timedelta duration

.. ipython:: python

Expand All @@ -2367,12 +2369,14 @@ Localization of nonexistent times will raise an error by default.
In [2]: dti.tz_localize('Europe/Warsaw')
NonExistentTimeError: 2015-03-29 02:30:00

Transform nonexistent times to ``NaT`` or the closest real time forward in time.
Transform nonexistent times to ``NaT`` or shift the times.

.. ipython:: python

dti
dti.tz_localize('Europe/Warsaw', nonexistent='shift')
dti.tz_localize('Europe/Warsaw', nonexistent='shift_forward')
dti.tz_localize('Europe/Warsaw', nonexistent='shift_backward')
dti.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta(1, unit='H'))
dti.tz_localize('Europe/Warsaw', nonexistent='NaT')


Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,7 @@ Other Enhancements
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
- :func:`read_fwf` now accepts keyword ``infer_nrows`` (:issue:`15138`).
- :func:`~DataFrame.to_parquet` now supports writing a ``DataFrame`` as a directory of parquet files partitioned by a subset of the columns when ``engine = 'pyarrow'`` (:issue:`23283`)
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`8917`)
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`8917`, :issue:`24466`)
- :meth:`Index.difference` now has an optional ``sort`` parameter to specify whether the results should be sorted if possible (:issue:`17839`)
- :meth:`read_excel()` now accepts ``usecols`` as a list of column names or callable (:issue:`18273`)
- :meth:`MultiIndex.to_flat_index` has been added to flatten multiple levels into a single-level :class:`Index` object.
Expand Down
64 changes: 47 additions & 17 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ from dateutil.tz import tzutc
from datetime import time as datetime_time
from cpython.datetime cimport (datetime, tzinfo,
PyDateTime_Check, PyDate_Check,
PyDateTime_CheckExact, PyDateTime_IMPORT)
PyDateTime_CheckExact, PyDateTime_IMPORT,
PyDelta_Check)
PyDateTime_IMPORT

from pandas._libs.tslibs.ccalendar import DAY_SECONDS, HOUR_SECONDS
Expand All @@ -28,7 +29,8 @@ from pandas._libs.tslibs.np_datetime import OutOfBoundsDatetime
from pandas._libs.tslibs.util cimport (
is_string_object, is_datetime64_object, is_integer_object, is_float_object)

from pandas._libs.tslibs.timedeltas cimport cast_from_unit
from pandas._libs.tslibs.timedeltas cimport (cast_from_unit,
delta_to_nanoseconds)
from pandas._libs.tslibs.timezones cimport (
is_utc, is_tzlocal, is_fixed_offset, get_utcoffset, get_dst_info,
get_timezone, maybe_get_tz, tz_compare)
Expand Down Expand Up @@ -868,7 +870,8 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
- bool if True, treat all vals as DST. If False, treat them as non-DST
- 'NaT' will return NaT where there are ambiguous times

nonexistent : {None, "NaT", "shift", "raise"}
nonexistent : {None, "NaT", "shift_forward", "shift_backward", "raise",
timedelta-like}
How to handle non-existent times when converting wall times to UTC

.. versionadded:: 0.24.0
Expand All @@ -884,12 +887,14 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
Py_ssize_t delta_idx_offset, delta_idx, pos_left, pos_right
int64_t *tdata
int64_t v, left, right, val, v_left, v_right, new_local, remaining_mins
int64_t HOURS_NS = HOUR_SECONDS * 1000000000
int64_t first_delta
int64_t HOURS_NS = HOUR_SECONDS * 1000000000, shift_delta = 0
ndarray[int64_t] trans, result, result_a, result_b, dst_hours, delta
ndarray trans_idx, grp, a_idx, b_idx, one_diff
npy_datetimestruct dts
bint infer_dst = False, is_dst = False, fill = False
bint shift = False, fill_nonexist = False
bint shift_forward = False, shift_backward = False
bint fill_nonexist = False
list trans_grp
str stamp

Expand Down Expand Up @@ -928,11 +933,16 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,

if nonexistent == 'NaT':
fill_nonexist = True
elif nonexistent == 'shift':
shift = True
else:
assert nonexistent in ('raise', None), ("nonexistent must be one of"
" {'NaT', 'raise', 'shift'}")
elif nonexistent == 'shift_forward':
shift_forward = True
elif nonexistent == 'shift_backward':
shift_backward = True
elif PyDelta_Check(nonexistent):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a practical use of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the original enhancement request #24466 (comment), there was no way to shift backwards, moreover by a specified amount. I can see how it would be valuable to shift nonexistent times forward or backwards by a specified amount instead of the closest time. (e.g. I want times on the half hours, so be able to shift times to 1:30 or 3:30 and not just 1:59 or 3:00)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this duplicating shift_backwards though? this seems unecessarily complex to do anything more that doing a snap, eg. to a valid time (forward or backwards). shifting is an independent operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it didn't add too much more complexity to the code, but here was @sdementen use case:

I do some calculation on a local timestamp without tz, that I shift by 2 hours backward (to say "take it two hours before") and then I localize and get a NonExistentTimeError (e.g. Timestamp("2018-03-25T04:33:00") - DateOffset(hours=2)). I would like to get as a result of the tz_localize('CET'), the time "2018-03-25T01:33:00+0100" or "2018-03-25T03:33:00+0200" (and not "2018-03-25T01:59:59.99999+0100" or "2018-03-25T03:00:00+0200")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how in principle how it's very similar to shift_forward/ shift_backwards then adding an offset, but it can be easier determining how you want to shift from a nonexistent time compared to a nonexistent time that has been snapped to 1:59 or 3:00.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdementen care to chime in on thoughts?

shift_delta = delta_to_nanoseconds(nonexistent)
elif nonexistent not in ('raise', None):
msg = ("nonexistent must be one of {'NaT', 'raise', 'shift_forward', "
"shift_backwards} or a timedelta object")
raise ValueError(msg)

trans, deltas, _ = get_dst_info(tz)

Expand Down Expand Up @@ -1041,15 +1051,35 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
result[i] = right
else:
# Handle nonexistent times
if shift:
# Shift the nonexistent time forward to the closest existing
# time
if shift_forward or shift_backward or shift_delta != 0:
# Shift the nonexistent time to the closest existing time
remaining_mins = val % HOURS_NS
new_local = val + (HOURS_NS - remaining_mins)
if shift_delta != 0:
# Validate that we don't relocalize on another nonexistent
# time
if -1 < shift_delta + remaining_mins < HOURS_NS:
raise ValueError(
"The provided timedelta will relocalize on a "
"nonexistent time: {}".format(nonexistent)
)
new_local = val + shift_delta
elif shift_forward:
new_local = val + (HOURS_NS - remaining_mins)
else:
# Subtract 1 since the beginning hour is _inclusive_ of
# nonexistent times
new_local = val - remaining_mins - 1
delta_idx = trans.searchsorted(new_local, side='right')
# Need to subtract 1 from the delta_idx if the UTC offset of
# the target tz is greater than 0
delta_idx_offset = int(deltas[0] > 0)
# Shift the delta_idx by if the UTC offset of
# the target tz is greater than 0 and we're moving forward
# or vice versa
first_delta = deltas[0]
if (shift_forward or shift_delta > 0) and first_delta > 0:
delta_idx_offset = 1
elif (shift_backward or shift_delta < 0) and first_delta < 0:
delta_idx_offset = 1
else:
delta_idx_offset = 0
delta_idx = delta_idx - delta_idx_offset
result[i] = new_local - deltas[delta_idx]
elif fill_nonexist:
Expand Down
40 changes: 28 additions & 12 deletions pandas/_libs/tslibs/nattype.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -481,13 +481,17 @@ class NaTType(_NaT):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

Expand Down Expand Up @@ -515,13 +519,17 @@ class NaTType(_NaT):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

Expand All @@ -545,13 +553,17 @@ class NaTType(_NaT):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

Expand Down Expand Up @@ -605,13 +617,17 @@ class NaTType(_NaT):
- 'NaT' will return NaT for an ambiguous time
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

nonexistent : 'shift', 'NaT', default 'raise'
nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

Expand Down
50 changes: 35 additions & 15 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ cimport numpy as cnp
from numpy cimport int64_t, int32_t, int8_t
cnp.import_array()

from datetime import time as datetime_time
from datetime import time as datetime_time, timedelta
from cpython.datetime cimport (datetime,
PyDateTime_Check, PyDelta_Check, PyTZInfo_Check,
PyDateTime_IMPORT)
Expand Down Expand Up @@ -789,13 +789,17 @@ class Timestamp(_Timestamp):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

Expand Down Expand Up @@ -827,13 +831,17 @@ class Timestamp(_Timestamp):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

Expand All @@ -859,13 +867,17 @@ class Timestamp(_Timestamp):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

Expand Down Expand Up @@ -1060,13 +1072,17 @@ class Timestamp(_Timestamp):
- 'NaT' will return NaT for an ambiguous time
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

nonexistent : 'shift', 'NaT', default 'raise'
nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

Expand Down Expand Up @@ -1106,9 +1122,13 @@ class Timestamp(_Timestamp):
raise ValueError("The errors argument must be either 'coerce' "
"or 'raise'.")

if nonexistent not in ('raise', 'NaT', 'shift'):
nonexistent_options = ('raise', 'NaT', 'shift_forward',
'shift_backward')
if nonexistent not in nonexistent_options and not isinstance(
nonexistent, timedelta):
raise ValueError("The nonexistent argument must be one of 'raise',"
" 'NaT' or 'shift'")
" 'NaT', 'shift_forward', 'shift_backward' or"
" a timedelta object")

if self.tzinfo is None:
# tz naive, localize
Expand Down
10 changes: 7 additions & 3 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,13 +233,17 @@ class TimelikeOps(object):

.. versionadded:: 0.24.0

nonexistent : 'shift', 'NaT', default 'raise'
nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC cython doesn't play nicely with docstring templating; might be worth opening an issue over there to request it

- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

Expand Down
Loading