Skip to content

!I fix for BUG: resample with tz-aware: Values falls after last bin #15549 #18337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Nov 21, 2017
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
68a02aa
!B [pandas-dev/pandas#15549] resample with tz-aware: Values falls aft…
ahcub Mar 2, 2017
47b99ac
restore mistakenly removed line
ahcub Mar 2, 2017
04ce929
remove redundant whitespaces
ahcub Mar 2, 2017
a43bb19
remove redundant whitespaces
ahcub Mar 2, 2017
44fc3f2
!U add tz to DatetimeIndex initialization
ahcub Mar 3, 2017
2a1d24c
!U add test for a bug: resample with tz-aware: Values falls after las…
ahcub Mar 3, 2017
d28ff98
!U do not convert TZ to UTC if it not set
ahcub Mar 3, 2017
7c770c0
!U revert changes with timezone
ahcub Nov 17, 2017
effa650
!U revert spaces
ahcub Nov 17, 2017
aa8c0af
!U change way the end timestamp is defined
ahcub Nov 17, 2017
2db6576
Merge branch 'master' of https://github.com/pandas-dev/pandas into pa…
ahcub Nov 17, 2017
2744ad3
Merge branch 'pandas-dev-master'
ahcub Nov 17, 2017
fb4d405
!U add conditional change in the formula for defining the end timestamp
ahcub Nov 17, 2017
93842d0
!U make the values_present argument optional
ahcub Nov 18, 2017
fc795c6
!U add line feed
ahcub Nov 19, 2017
514fe24
!U fix formatting
ahcub Nov 19, 2017
777a12a
!U add a line feed
ahcub Nov 19, 2017
037c9dc
!U fix formatting
ahcub Nov 19, 2017
d5ac67e
!U change the fix to only handle the problem on core/resample.py level
ahcub Nov 19, 2017
61e84d6
!U fix formatting
ahcub Nov 19, 2017
e049112
!U change the index from which we get the dates to appent
ahcub Nov 19, 2017
e252b20
!U remove redundant check and fix index
ahcub Nov 19, 2017
1888355
!U add check for binner len
ahcub Nov 19, 2017
f4ed7c0
!U add missing arguments to the extra dates generation
ahcub Nov 20, 2017
9b18f4b
!U add comments
ahcub Nov 20, 2017
5b035f0
!U change test name and add expected df to compare result to
ahcub Nov 20, 2017
0ec3bd0
!R reformat the code
ahcub Nov 20, 2017
b523915
!U add whatsnew bug fix entry
ahcub Nov 20, 2017
a1b59c4
!U reformat the code
ahcub Nov 20, 2017
8cd36c9
Merge branch 'master' into master
ahcub Nov 20, 2017
aec765f
!U reformat the code according to PR comments
ahcub Nov 21, 2017
2e77f72
!U fix formatting
ahcub Nov 21, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 13 additions & 8 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -366,9 +366,11 @@ def __new__(cls, data=None,
pass

if data is None:
values_present = kwargs.pop('values_present', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what the heck is all of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, what do you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a change to fix a problem with resampling for tz-aware series in one of the edge cases
I hit this problem often unfortunately and from my understanding of the design I created a solution that doesn't break other functionality but fixes the problem I have
sorry if the solution doesn't look elegant, if you think there is other one that we can use, please let me know

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure what you are doing. Your changes should be restriced to pandas/core/resample.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you mean that I should not change anything in pandas/core/resample.py then unfortunately I don't know how to avoid that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the opposite. There's where the change should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can implement a solution with changes in pandas/core/resample.py only I think
let me try

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think this is a specific change that resample needs, not generically for date_ranges.

return cls._generate(start, end, periods, name, freq,
tz=tz, normalize=normalize, closed=closed,
ambiguous=ambiguous)
ambiguous=ambiguous,
values_present=values_present)

if not isinstance(data, (np.ndarray, Index, ABCSeries)):
if is_scalar(data):
Expand Down Expand Up @@ -463,7 +465,8 @@ def __new__(cls, data=None,

@classmethod
def _generate(cls, start, end, periods, name, offset,
tz=None, normalize=False, ambiguous='raise', closed=None):
tz=None, normalize=False, ambiguous='raise', closed=None,
values_present=False):
if com._count_not_none(start, end, periods) != 2:
raise ValueError('Of the three parameters: start, end, and '
'periods, exactly two must be specified')
Expand Down Expand Up @@ -552,7 +555,8 @@ def _generate(cls, start, end, periods, name, offset,
index = cls._cached_range(start, end, periods=periods,
offset=offset, name=name)
else:
index = _generate_regular_range(start, end, periods, offset)
index = _generate_regular_range(start, end, periods, offset,
values_present)

else:

Expand Down Expand Up @@ -2016,15 +2020,16 @@ def to_julian_date(self):
DatetimeIndex._add_datetimelike_methods()


def _generate_regular_range(start, end, periods, offset):
def _generate_regular_range(start, end, periods, offset, values_present=False):
if isinstance(offset, Tick):
stride = offset.nanos
if periods is None:
b = Timestamp(start).value
# cannot just use e = Timestamp(end) + 1 because arange breaks when
# stride is too large, see GH10887
e = (b + (Timestamp(end).value - b) // stride * stride +
stride // 2 + 1)
if values_present:
e = (Timestamp(end).value + stride // 2 + 1)
else:
e = (b + (Timestamp(end).value - b) // stride * stride +
stride // 2 + 1)
# end.tz == start.tz by this point due to _generate implementation
tz = start.tz
elif start is not None:
Expand Down
6 changes: 5 additions & 1 deletion pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -1130,6 +1130,9 @@ def _get_time_bins(self, ax):
closed=self.closed,
base=self.base)
tz = ax.tz
values_present = isinstance(getattr(self, 'obj', None),
(pd.DataFrame, pd.Series))

# GH #12037
# use first/last directly instead of call replace() on them
# because replace() will swallow the nanosecond part
Expand All @@ -1139,7 +1142,8 @@ def _get_time_bins(self, ax):
start=first,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly are you trying to do here? what is the purpose of values_present, IOW what case are you trying to handle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the case is described here #15549
unfortunately, I get values fall after last bin error when I do resampling of tz-aware series with summer-winter time change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in case if we are resampling series I believe an extra bin should be generated to handle this
but in case of generation of the datetimeindex with date_range for example we should not get the extra bin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

purpose of values_present is to be able to say which case is that, that we are creating datetimeindex for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hope that makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the case. My point is that you can detect what you need and simply generate different bins at that point, rather than intrude to the implementation of date_range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the soln that you posted in the issue looks promising. did you try that?

Copy link
Contributor Author

@ahcub ahcub Nov 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, last time I made a PR with it you said that it is not right to resample by UTC a tz aware series
and after giving it a thought I think you are right
it's not a good idea to resample everything by UTC, also I did that change internally for our company and I got issues with it.
like when I resample with B freq, I get a data point on sunday, but not on friday

Copy link
Contributor

@jreback jreback Nov 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that might be a bigger change, ok (and maybe not what we want semantically)

end=last,
tz=tz,
name=ax.name)
name=ax.name,
values_present=values_present)

# a little hack
trimmed = False
Expand Down
6 changes: 6 additions & 0 deletions pandas/tests/test_resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -2719,6 +2719,12 @@ def test_resample_weekly_bug_1726(self):
# it works!
df.resample('W-MON', closed='left', label='left').first()

def test_resample_tz_aware_bug_15549(self):
index = pd.DatetimeIndex([1450137600000000000, 1474059600000000000],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the issue number to a comment (and out of the title)

tz='UTC').tz_convert('America/Chicago')
df = pd.DataFrame([1, 2], index=index)
df.resample('12h', closed='right', label='right').last().ffill()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an expected and compare using assert_frame_equal


def test_resample_bms_2752(self):
# GH2753
foo = Series(index=pd.bdate_range('20000101', '20000201'))
Expand Down