-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
!I fix for BUG: resample with tz-aware: Values falls after last bin #15549 #18337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
68a02aa
47b99ac
04ce929
a43bb19
44fc3f2
2a1d24c
d28ff98
7c770c0
effa650
aa8c0af
2db6576
2744ad3
fb4d405
93842d0
fc795c6
514fe24
777a12a
037c9dc
d5ac67e
61e84d6
e049112
e252b20
1888355
f4ed7c0
9b18f4b
5b035f0
0ec3bd0
b523915
a1b59c4
8cd36c9
aec765f
2e77f72
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1130,9 +1130,6 @@ def _get_time_bins(self, ax): | |
closed=self.closed, | ||
base=self.base) | ||
tz = ax.tz | ||
values_present = isinstance(getattr(self, 'obj', None), | ||
(pd.DataFrame, pd.Series)) | ||
|
||
# GH #12037 | ||
# use first/last directly instead of call replace() on them | ||
# because replace() will swallow the nanosecond part | ||
|
@@ -1142,8 +1139,15 @@ def _get_time_bins(self, ax): | |
start=first, | ||
end=last, | ||
tz=tz, | ||
name=ax.name, | ||
values_present=values_present) | ||
name=ax.name) | ||
|
||
# GH 15549 | ||
values_present = isinstance(getattr(self, 'obj', None), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. still not clear why you need this tests for obj is Series/DataFrame. that 's all it can be. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also need to check that len(binner) >= 2 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I was not sure about that the obj always exists There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there was actually a mistake, it supposed to go start from binner[-1] not binner[-2], I fixed it anyway |
||
(pd.DataFrame, pd.Series)) | ||
if values_present and binner[-1] < last: | ||
extra_date_range = pd.date_range(binner[-2], last + self.freq, | ||
freq=self.freq) | ||
binner = labels = binner.append(extra_date_range[1:]) | ||
|
||
# a little hack | ||
trimmed = False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactly are you trying to do here? what is the purpose of
values_present
, IOW what case are you trying to handle?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the case is described here #15549
unfortunately, I get
values fall after last bin
error when I do resampling of tz-aware series with summer-winter time changeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in case if we are resampling series I believe an extra bin should be generated to handle this
but in case of generation of the datetimeindex with date_range for example we should not get the extra bin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
purpose of values_present is to be able to say which case is that, that we are creating datetimeindex for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the case. My point is that you can detect what you need and simply generate different bins at that point, rather than intrude to the implementation of
date_range
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the soln that you posted in the issue looks promising. did you try that?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, last time I made a PR with it you said that it is not right to resample by UTC a tz aware series
and after giving it a thought I think you are right
it's not a good idea to resample everything by UTC, also I did that change internally for our company and I got issues with it.
like when I resample with B freq, I get a data point on sunday, but not on friday
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that might be a bigger change, ok (and maybe not what we want semantically)