-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: Period.to_timestamp not matching PeriodArray.to_timestamp #34449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -632,6 +632,13 @@ def _ex(p): | |||
result = p.to_timestamp("5S", how="start") | |||
assert result == expected | |||
|
|||
def test_to_timestamp_business_end(self): | |||
per = pd.Period("1990-01-05", "B") # Friday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test how='B', and is this result correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have tests for the other case, and it is the correct result.
i got this test by going into the PeriodArray.to_timestamp code and computing the same thing element-wise, then asserting that they matched. this is one of the cases that failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct, 1/6 is a sunday, by-definition not a business day
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are missing my point, should this not be pd.Timestamp('1990-01-8') - pd.Timedelta(nanoseconds=1)
, IOW 1 nano less than the next period which starts on 1990-01-8
e.g. (not including this PR)
In [1]: p = pd.Period("1990-01-05", "B")
In [2]: p
Out[2]: Period('1990-01-05', 'B')
In [3]: p.start_time
Out[3]: Timestamp('1990-01-05 00:00:00')
In [4]: p.end_time
Out[4]: Timestamp('1990-01-07 23:59:59.999999999')
In [6]: pd.Timestamp('1990-01-6') - pd.Timedelta(nanoseconds=1)
Out[6]: Timestamp('1990-01-05 23:59:59.999999999')
In [8]: pd.Timestamp('1990-01-8') - pd.Timedelta(nanoseconds=1)
Out[8]: Timestamp('1990-01-07 23:59:59.999999999')
I believe we should be making [4] == [8]
[6] (what you are testing) does not seem right as it doesn't include the weekend. I believe we cover the complete space with the end-times, IOW you can line the periods up and there is NO space in-between.
pandas/_libs/tslibs/period.pyx
Outdated
how = validate_end_alias(how) | ||
|
||
end = how == 'E' | ||
if end: | ||
if freq == "B": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there no other freqs for which you need to do a similar adjustment? (like the other business-related freqs?)
This is the only one that doesn’t “cover the space” so I think this is it
…On Fri, May 29, 2020 at 10:57 AM Joris Van den Bossche < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/_libs/tslibs/period.pyx
<#34449 (comment)>:
> how = validate_end_alias(how)
end = how == 'E'
if end:
+ if freq == "B":
Are there no other freqs for which you need to do a similar adjustment?
(like the other business-related freqs?)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#34449 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5UM6GDPGRXZ7A4LUILL63RT7ZO7ANCNFSM4NNUBIWA>
.
|
Maybe it is not fully related to the changes in this PR, but just since I am trying to understand it now:
Shouldn't the above also give end of "1990-01-05" ? (I am not an avid Period user, to just questions about something that doesn't feel consistent for a novice) |
yes it should, good catch. will update |
Is BusinessDay ('B') the only "business" offset that can be used with Periods? Like |
…f-to_timestamp
yes.
the relationship between DateOffsets and Periods is mostly coincidental, and this has caused confusion in the past. im leaning towards wearning Period off of offsets entirely. |
@@ -632,6 +632,13 @@ def _ex(p): | |||
result = p.to_timestamp("5S", how="start") | |||
assert result == expected | |||
|
|||
def test_to_timestamp_business_end(self): | |||
per = pd.Period("1990-01-05", "B") # Friday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct, 1/6 is a sunday, by-definition not a business day
So unless they skipped Saturday that week... |
@@ -632,6 +632,13 @@ def _ex(p): | |||
result = p.to_timestamp("5S", how="start") | |||
assert result == expected | |||
|
|||
def test_to_timestamp_business_end(self): | |||
per = pd.Period("1990-01-05", "B") # Friday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are missing my point, should this not be pd.Timestamp('1990-01-8') - pd.Timedelta(nanoseconds=1)
, IOW 1 nano less than the next period which starts on 1990-01-8
e.g. (not including this PR)
In [1]: p = pd.Period("1990-01-05", "B")
In [2]: p
Out[2]: Period('1990-01-05', 'B')
In [3]: p.start_time
Out[3]: Timestamp('1990-01-05 00:00:00')
In [4]: p.end_time
Out[4]: Timestamp('1990-01-07 23:59:59.999999999')
In [6]: pd.Timestamp('1990-01-6') - pd.Timedelta(nanoseconds=1)
Out[6]: Timestamp('1990-01-05 23:59:59.999999999')
In [8]: pd.Timestamp('1990-01-8') - pd.Timedelta(nanoseconds=1)
Out[8]: Timestamp('1990-01-07 23:59:59.999999999')
I believe we should be making [4] == [8]
[6] (what you are testing) does not seem right as it doesn't include the weekend. I believe we cover the complete space with the end-times, IOW you can line the periods up and there is NO space in-between.
For non-B periods that is how it works, but we're not currently consistent about it for B. So we have to decide what it means. I dont think that Saturday or Sunday are part of the "Business Day" Friday.
The commit that changed end_time was added based on Joris pointing out the inconsistency and me agreeing with him. But if that is a sticking point, I can revert that so we can at least get the Period/PeriodArray methods consistent with each other (after which they get to share code) and we can hammer out the end_time thing separately |
ok maybe that is the issue, its inconcistent! yeah I tend to think that the periods should be contiguous. But we can discuss that later. happy to take this. I think this warrants an issue if you can put detail about when this is inconsistent (either internally e.g vs PA, though i think you are fixing), or against others offsets then that would be good. |
I would say that this is the exact feature of "business day" that it is not contiguous? (otherwise you would use the normal "day" frequency? |
yes Timestamps are non-contiguous, but that is not what we are talking about here. A Period is a span, so you need a specific start & end point. Generally you want to include sat/sunday here for example. Or at least IMHO is the most common operation. Sure I suppose others may want a different behavior. |
But why would you want to include saturday/sunday for a business day ? Then you can use the normal daily frequency? (now, I never used business days in practice, so I don't really know how it is typically used) |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
With this change, I think we can move to share some methods between Period and PeriodArray.