Skip to content

nonexistent=shift in tz_localize not precise #24466

Closed
@sdementen

Description

@sdementen

Problem description

As of today (2018-12-28), I read on http://pandas-docs.github.io/pandas-docs-travis/timeseries.html#nonexistent-times-when-localizing that

A DST transition may also shift the local time ahead by 1 hour creating nonexistent local times. The behavior of localizing a timeseries with nonexistent times can be controlled by the nonexistent argument. The following options are available:

raise: Raises a pytz.NonExistentTimeError (the default behavior)
NaT: Replaces nonexistent times with NaT
shift: Shifts nonexistent times forward to the closest real time

This is a great new feature (i.e. having leeway to manage NonExistentTimeError explicitly)!
To be sure I understand the problem of NonExistentTimeError correctly, is it correct to state they appear if an only if the time happens during a DST change (jumping one hour ahead) ? e.g. if I take tz=CET, we had the DST on the 2018-03-25, with the hour [02:00->03:00[ not existing so any localization of such time will raise a NonExistentTimeError:

Timestamp("2018-03-25T02:33:00").tz_localize("CET")`)
# pytz.exceptions.NonExistentTimeError: 2018-03-25 02:33:00

Or are there other cases ?

If so, I see the following behavior that would be useful besides the 'shift':

  • 'shift_backward' / 'shift_forward' ==> Shift nonexistent times backward/forward by one hour. Example of use case for shift_backward: I do some calculation on a local timestamp without tz, that I shift by 2 hours backward (to say "take it two hours before") and then I localize and get a NonExistentTimeError (e.g. Timestamp("2018-03-25T04:33:00") - DateOffset(hours=2)). I would like to get as a result of the tz_localize('CET'), the time "2018-03-25T01:33:00+0100" or "2018-03-25T03:33:00+0200" (and not "2018-03-25T01:59:59.99999+0100" or "2018-03-25T03:00:00+0200" as with 'snap_*')
  • 'snap_backward' / 'snap_forward' => Shifts nonexistent times backward/forward to the closest real time (ie explicit the direction of the shift). Example of use case for snap_backward: if I want to get the value of something known at some instant T and T is not existent, I would rather prefer to have the value at some instant T* < T (='backward') to avoid having "forward looking information". However, the 'snap_backward' is ill defined for the DST as the closest backward time for 2018-03-25T02:33:00 in CET is 2018-03-25 01:59:59.999999999...=2018-03-25 02:00:00 which is non existent... I guess this is the reason why the current 'shift' only propose the forward version, correct?

I miss essentially the 'shift_backward' ability in my day to day cases.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions