Skip to content

BUG: Period.to_timestamp not matching PeriodArray.to_timestamp #34449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 1, 2020

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented May 29, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

With this change, I think we can move to share some methods between Period and PeriodArray.

@jreback jreback added Period Period data type Datetime Datetime data dtype labels May 29, 2020
@@ -632,6 +632,13 @@ def _ex(p):
result = p.to_timestamp("5S", how="start")
assert result == expected

def test_to_timestamp_business_end(self):
per = pd.Period("1990-01-05", "B") # Friday
Copy link
Contributor

@jreback jreback May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test how='B', and is this result correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have tests for the other case, and it is the correct result.

i got this test by going into the PeriodArray.to_timestamp code and computing the same thing element-wise, then asserting that they matched. this is one of the cases that failed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct, 1/6 is a sunday, by-definition not a business day

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are missing my point, should this not be pd.Timestamp('1990-01-8') - pd.Timedelta(nanoseconds=1), IOW 1 nano less than the next period which starts on 1990-01-8

e.g. (not including this PR)

In [1]: p = pd.Period("1990-01-05", "B")                                                                                                                      

In [2]: p                                                                                                                                                     
Out[2]: Period('1990-01-05', 'B')

In [3]: p.start_time                                                                                                                                          
Out[3]: Timestamp('1990-01-05 00:00:00')

In [4]: p.end_time                                                                                                                                            
Out[4]: Timestamp('1990-01-07 23:59:59.999999999')

In [6]: pd.Timestamp('1990-01-6') - pd.Timedelta(nanoseconds=1)                                                                                               
Out[6]: Timestamp('1990-01-05 23:59:59.999999999')

In [8]: pd.Timestamp('1990-01-8') - pd.Timedelta(nanoseconds=1)                                                                                               
Out[8]: Timestamp('1990-01-07 23:59:59.999999999')

I believe we should be making [4] == [8]

[6] (what you are testing) does not seem right as it doesn't include the weekend. I believe we cover the complete space with the end-times, IOW you can line the periods up and there is NO space in-between.

how = validate_end_alias(how)

end = how == 'E'
if end:
if freq == "B":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there no other freqs for which you need to do a similar adjustment? (like the other business-related freqs?)

@jbrockmendel
Copy link
Member Author

jbrockmendel commented May 29, 2020 via email

@jorisvandenbossche
Copy link
Member

Maybe it is not fully related to the changes in this PR, but just since I am trying to understand it now:

In [7]: per = pd.Period("1990-01-05", "B")  

In [8]: per.end_time   
Out[8]: Timestamp('1990-01-07 23:59:59.999999999')

Shouldn't the above also give end of "1990-01-05" ?
So in terms of the code change of your PR, shouldn't the adjustment (also) depend on self.freq instead of the passed freq?

(I am not an avid Period user, to just questions about something that doesn't feel consistent for a novice)

@jbrockmendel
Copy link
Member Author

Shouldn't the above also give end of "1990-01-05" ?

yes it should, good catch. will update

@jorisvandenbossche
Copy link
Member

Is BusinessDay ('B') the only "business" offset that can be used with Periods?

Like pd.period_range("2012", freq=pd.offsets.BusinessDay(), periods=10) works, but pd.period_range("2012", freq=pd.offsets.BusinessMonthEnd(), periods=10) fails

@jbrockmendel
Copy link
Member Author

Is BusinessDay ('B') the only "business" offset that can be used with Periods?

yes.

but pd.period_range("2012", freq=pd.offsets.BusinessMonthEnd(), periods=10) fails

the relationship between DateOffsets and Periods is mostly coincidental, and this has caused confusion in the past. im leaning towards wearning Period off of offsets entirely.

@jreback jreback added this to the 1.1 milestone May 31, 2020
@@ -632,6 +632,13 @@ def _ex(p):
result = p.to_timestamp("5S", how="start")
assert result == expected

def test_to_timestamp_business_end(self):
per = pd.Period("1990-01-05", "B") # Friday
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct, 1/6 is a sunday, by-definition not a business day

@jbrockmendel
Copy link
Member Author

I don't think this is correct, 1/6 is a sunday, by-definition not a business day

>>> ts = pd.Timestamp("1990-01-05")
>>> print(ts.day_name())
'Friday'

So unless they skipped Saturday that week...

@@ -632,6 +632,13 @@ def _ex(p):
result = p.to_timestamp("5S", how="start")
assert result == expected

def test_to_timestamp_business_end(self):
per = pd.Period("1990-01-05", "B") # Friday
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are missing my point, should this not be pd.Timestamp('1990-01-8') - pd.Timedelta(nanoseconds=1), IOW 1 nano less than the next period which starts on 1990-01-8

e.g. (not including this PR)

In [1]: p = pd.Period("1990-01-05", "B")                                                                                                                      

In [2]: p                                                                                                                                                     
Out[2]: Period('1990-01-05', 'B')

In [3]: p.start_time                                                                                                                                          
Out[3]: Timestamp('1990-01-05 00:00:00')

In [4]: p.end_time                                                                                                                                            
Out[4]: Timestamp('1990-01-07 23:59:59.999999999')

In [6]: pd.Timestamp('1990-01-6') - pd.Timedelta(nanoseconds=1)                                                                                               
Out[6]: Timestamp('1990-01-05 23:59:59.999999999')

In [8]: pd.Timestamp('1990-01-8') - pd.Timedelta(nanoseconds=1)                                                                                               
Out[8]: Timestamp('1990-01-07 23:59:59.999999999')

I believe we should be making [4] == [8]

[6] (what you are testing) does not seem right as it doesn't include the weekend. I believe we cover the complete space with the end-times, IOW you can line the periods up and there is NO space in-between.

@jbrockmendel
Copy link
Member Author

I believe we cover the complete space with the end-times, IOW you can line the periods up and there is NO space in-between.

For non-B periods that is how it works, but we're not currently consistent about it for B. So we have to decide what it means. I dont think that Saturday or Sunday are part of the "Business Day" Friday.

In [2]: per = pd.Period("1990-01-05", "B")
In [3]: pa = pd.array([per])

In [4]: per.end_time
Out[4]: Timestamp('1990-01-07 23:59:59.999999999')
In [5]: pa.end_time[0]
Out[5]: Timestamp('1990-01-07 23:59:59.999999999')

In [6]: per.to_timestamp(how="E")                                                                                                                                                                   
Out[6]: Timestamp('1990-01-07 23:59:59.999999999')
In [7]: pa.to_timestamp(how="E")[0]
Out[7]: Timestamp('1990-01-07 23:59:59.999999999')

In [10]: per.to_timestamp("B", how="E")
Out[10]: Timestamp('1990-01-07 23:59:59.999999999')
In [12]: pa.to_timestamp("B", how="E")[0]         # <-- outlier
Out[12]: Timestamp('1990-01-05 23:59:59.999999999')

The commit that changed end_time was added based on Joris pointing out the inconsistency and me agreeing with him. But if that is a sticking point, I can revert that so we can at least get the Period/PeriodArray methods consistent with each other (after which they get to share code) and we can hammer out the end_time thing separately

@jreback
Copy link
Contributor

jreback commented Jun 1, 2020

@jbrockmendel

ok maybe that is the issue, its inconcistent! yeah I tend to think that the periods should be contiguous. But we can discuss that later. happy to take this. I think this warrants an issue if you can put detail about when this is inconsistent (either internally e.g vs PA, though i think you are fixing), or against others offsets then that would be good.

@jreback jreback merged commit 2428cdd into pandas-dev:master Jun 1, 2020
@jorisvandenbossche
Copy link
Member

I tend to think that the periods should be contiguous

I would say that this is the exact feature of "business day" that it is not contiguous? (otherwise you would use the normal "day" frequency?

@jreback
Copy link
Contributor

jreback commented Jun 1, 2020

I tend to think that the periods should be contiguous

I would say that this is the exact feature of "business day" that it is not contiguous? (otherwise you would use the normal "day" frequency?

yes Timestamps are non-contiguous, but that is not what we are talking about here. A Period is a span, so you need a specific start & end point. Generally you want to include sat/sunday here for example. Or at least IMHO is the most common operation. Sure I suppose others may want a different behavior.

@jorisvandenbossche
Copy link
Member

Generally you want to include sat/sunday here for example

But why would you want to include saturday/sunday for a business day ? Then you can use the normal daily frequency?

(now, I never used business days in practice, so I don't really know how it is typically used)

@jbrockmendel jbrockmendel deleted the ref-to_timestamp branch June 1, 2020 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants