Skip to content

ENH: Add additional options to nonexistent in tz_localize #24493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 3, 2019

Conversation

mroeschke
Copy link
Member

Per @sdementen suggestion, adding additional options for nonexistent that would prove more useful. Let me know if this is what you had in mind.

shift is now shift_forward: shift nonexistent times forward to the closest real time
shift_backward: shift nonexistent times backwards to the closest real time
timedelta object: shifts nonexistent times by the timedelta duration

@mroeschke mroeschke added Enhancement Timezones Timezone data dtype labels Dec 30, 2018
@mroeschke mroeschke added this to the 0.24.0 milestone Dec 30, 2018
shift_forward = True
elif nonexistent == 'shift_backward':
shift_backward = True
elif isinstance(nonexistent, timedelta):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyDelta_Check(nonexistent)

- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC cython doesn't play nicely with docstring templating; might be worth opening an issue over there to request it

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. can you add an Example or 2 somewhere in a doc-string.

@codecov
Copy link

codecov bot commented Dec 31, 2018

Codecov Report

Merging #24493 into master will decrease coverage by 49.25%.
The diff coverage is 80%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #24493       +/-   ##
===========================================
- Coverage   92.31%   43.06%   -49.26%     
===========================================
  Files         166      166               
  Lines       52412    52415        +3     
===========================================
- Hits        48382    22570    -25812     
- Misses       4030    29845    +25815
Flag Coverage Δ
#multiple ?
#single 43.06% <80%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/resample.py 23.26% <ø> (-73.96%) ⬇️
pandas/core/arrays/datetimelike.py 44.42% <100%> (-51.08%) ⬇️
pandas/core/arrays/datetimes.py 65.68% <100%> (-32.06%) ⬇️
pandas/core/generic.py 39.84% <50%> (-56.78%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-98.65%) ⬇️
... and 128 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ebfd8a...b2a3c72. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 31, 2018

Codecov Report

Merging #24493 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24493      +/-   ##
==========================================
+ Coverage   92.33%   92.33%   +<.01%     
==========================================
  Files         166      166              
  Lines       52544    52547       +3     
==========================================
+ Hits        48517    48520       +3     
  Misses       4027     4027
Flag Coverage Δ
#multiple 90.76% <100%> (ø) ⬆️
#single 43.05% <66.66%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/datetimelike.py 96.76% <ø> (ø) ⬆️
pandas/core/resample.py 97.22% <ø> (ø) ⬆️
pandas/core/arrays/datetimes.py 98.01% <100%> (ø) ⬆️
pandas/core/generic.py 96.62% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 43b35fc...5fd9110. Read the comment docs.

shift_forward = True
elif nonexistent == 'shift_backward':
shift_backward = True
elif PyDelta_Check(nonexistent):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a practical use of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the original enhancement request #24466 (comment), there was no way to shift backwards, moreover by a specified amount. I can see how it would be valuable to shift nonexistent times forward or backwards by a specified amount instead of the closest time. (e.g. I want times on the half hours, so be able to shift times to 1:30 or 3:30 and not just 1:59 or 3:00)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this duplicating shift_backwards though? this seems unecessarily complex to do anything more that doing a snap, eg. to a valid time (forward or backwards). shifting is an independent operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it didn't add too much more complexity to the code, but here was @sdementen use case:

I do some calculation on a local timestamp without tz, that I shift by 2 hours backward (to say "take it two hours before") and then I localize and get a NonExistentTimeError (e.g. Timestamp("2018-03-25T04:33:00") - DateOffset(hours=2)). I would like to get as a result of the tz_localize('CET'), the time "2018-03-25T01:33:00+0100" or "2018-03-25T03:33:00+0200" (and not "2018-03-25T01:59:59.99999+0100" or "2018-03-25T03:00:00+0200")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how in principle how it's very similar to shift_forward/ shift_backwards then adding an offset, but it can be easier determining how you want to shift from a nonexistent time compared to a nonexistent time that has been snapped to 1:59 or 3:00.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdementen care to chime in on thoughts?

@jreback
Copy link
Contributor

jreback commented Jan 3, 2019

thanks @mroeschke

@jreback
Copy link
Contributor

jreback commented Jan 3, 2019

can you merge master

@jreback jreback merged commit 268150f into pandas-dev:master Jan 3, 2019
@jreback
Copy link
Contributor

jreback commented Jan 3, 2019

thanks @mroeschke

@mroeschke mroeschke deleted the nonexistent_options branch January 3, 2019 03:49
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nonexistent=shift in tz_localize not precise
3 participants