-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: pd.DateOffset handle milliseconds #50020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pd.DateOffset handle milliseconds #50020
Conversation
IMO after this fix, I'm testing some implementations in a separate branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR. looks like a merge conflict
pandas/_libs/tslibs/offsets.pyx
Outdated
@@ -314,6 +314,9 @@ cdef _determine_offset(kwds): | |||
|
|||
use_relativedelta = False | |||
if len(kwds_no_nanos) > 0: | |||
if "milliseconds" in kwds_no_nanos: | |||
return timedelta(**kwds_no_nanos), use_relativedelta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you assign to offset
and use_relativedelta
and just let the function return naturally? I think flows better with the other branching
@@ -739,6 +739,10 @@ def test_eq(self): | |||
|
|||
assert DateOffset(milliseconds=3) != DateOffset(milliseconds=7) | |||
|
|||
def test_milliseconds_combination(self): | |||
# 10525 | |||
DateOffset(days=1, milliseconds=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an assertion that the object has been properly constructed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm. @jbrockmendel or @MarcoGorelli care to look?
pandas/_libs/tslibs/offsets.pyx
Outdated
@@ -314,7 +314,9 @@ cdef _determine_offset(kwds): | |||
|
|||
use_relativedelta = False | |||
if len(kwds_no_nanos) > 0: | |||
if any(k in _kwds_use_relativedelta for k in kwds_no_nanos): | |||
if "milliseconds" in kwds_no_nanos: | |||
offset = timedelta(**kwds_no_nanos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would still fail if a user passed DateOffset(milliseconds=1, month=1)
i.e. any non-timedelta keyword constructor was passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it fails. What should be done in cases like this? Both timedelta and relativedelta will throw an error.
TBH I don't know how to fix this bug without trying to rework the entire function. Does it make sense to broaden up the scope of this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be okay to broaden the scope if all the existing tests pass. Generally I would imagine there should be checking that kwds
(besides nanoseconds) should either all be valid for timedelta
or relativetimedelta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I had in mind.
All tests in tests/arithmetic/test_datetime64.py
and tests/tseries/offsets/test_offsets.py
pass.
DateOffset(milliseconds=1, month=1)
fails because I don't know what should happen in this case (see last return stmt in first link)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I just open another PR with the rework linked above and close this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be fine to just update this PR with your proposed refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we do something like
if 'milliseconds' in kwds:
if "microseconds in kwds:
raise ...
kwds['microseconds'] = kwds.pop('milliseconds') * 1000
(and something similar in the if "millisecond" in kwds_no_nanos
case below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this code instead of the exception block (full code here) :
if "millisecond" in kwds_no_nanos:
micro = kwds_no_nanos.pop("millisecond") * 1000
kwds_no_nanos["microsecond"] = kwds_no_nanos.get("microsecond", 0) + micro
if "milliseconds" in kwds_no_nanos:
micro = kwds_no_nanos.pop("milliseconds") * 1000
kwds_no_nanos["microseconds"] = kwds_no_nanos.get("microseconds", 0) + micro
but it fails 2 tests in pandas/tests/tseries/offsets/test_offsets.py, both of which are meant to intentionally fail if we pass millisecond
. So we aren't breaking any real functionality, I guess.
TestDateOffset.test_constructor[millisecond]
[XPASS(strict)] Constructing DateOffset object with `millisecond` is not yet supported.
test_valid_relativedelta_kwargs[millisecond]
[XPASS(strict)] Constructing DateOffset object with `millisecond` is not yet supported.
However, I think the requirement is to have all the existing tests pass.
There is still the case where we receive minutes
(arg valid for timedelta
) + another arg valid for relativedelta
only.
In general, the last return needs to be discussed: What do we do in "all other cases"?
pandas/_libs/tslibs/offsets.pyx
Outdated
# This also handles "milliseconds" (plur): see GH 49897 | ||
return timedelta(**kwds_no_nanos), False | ||
|
||
# TODO what to do in this case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would imagine this should raise since there are mixed relativedelta and timedelta based kwds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id have to double-check but i think pytimedelta keywords are also valid relativedelta keywords
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All except milliseconds
.
And for some reason, minutes
is not present in _kwds_use_relativedelta
, despite being a valid relativedelta arg.
It breaks tests if I add it
With our current configuration, we end up in this last return stmt, when:
minutes
+ another relativedelta argmilliseconds
+ another relativedelta arg- any invalid arg for either relativedelta or timedelta
I think it makes sense to raise, e.g: Invalid keyword configuration
.
Edit: fixed the above
I made it:
It was tough to maintain compatibility with older tests. If new issues about this or related modules come up, feel free to tag me. |
thanks for working on this, I'll take a look soon looks like CI didn't run for some reason, could you resolve the merge conflicts and push please? that should restart it |
@pytest.mark.parametrize( | ||
"offset_kwargs, expected_arg", | ||
[ | ||
({"seconds": 1, "milliseconds": 1}, "2022-01-01 00:00:01.001"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a test to check when both milliseconds
and microseconds
were passed, seeing as you sum them together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(CI failures are unrelated BTW, if you fetch and merge upstream/main they should be fixed)
pandas/_libs/tslibs/offsets.pyx
Outdated
"seconds", "microseconds", "milliseconds", "minutes", "hours", | ||
} | ||
|
||
# check timedelta before relativedelta: maintain compatibility with old tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"maintain compatibility with old tests" is usually a given, not sure it needs a comment 😄
doc/source/whatsnew/v2.0.0.rst
Outdated
@@ -794,6 +794,7 @@ Datetimelike | |||
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing empty string and non-ISO8601 format was passed. Now, empty strings will be parsed as :class:`NaT`, for compatibility with how is done for ISO8601 formats (:issue:`50251`) | |||
- Bug in :class:`Timestamp` was showing ``UserWarning``, which was not actionable by users, when parsing non-ISO8601 delimited date strings (:issue:`50232`) | |||
- Bug in :func:`to_datetime` was showing misleading ``ValueError`` when parsing dates with format containing ISO week directive and ISO weekday directive (:issue:`50308`) | |||
- Bug in :class:`DateOffset` when constructing with milliseconds in combination with another argument (:issue:`49897`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we be more specific? like Bug in :class:
DateOffset` was throwing ``TypeError```..., or whatever it was
…iseconds # Conflicts: # doc/source/whatsnew/v2.0.0.rst
…andas into bug-dateoffset-milliseconds # Conflicts: # doc/source/whatsnew/v2.0.0.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, but the rest looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Sorry, just noticed a couple more things, but other than that, really solid work here, nice 💪 Much simpler to understand like this
raise ValueError( | ||
f"Invalid argument/s or bad combination of arguments: {list(kwds.keys())}" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a test that hits this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, ever since minutes was added to kwds_use_relativedelta
, and milliseconds is converted to microseconds, the only case where you raise is for invalid args
So it would be something like:
def test_offset_invalid_arguments(self):
with pytest.raises(ValueError, match="^Invalid argument/s or bad combination of arguments"):
DateOffset(picoseconds=1) # or any other invalid arg
Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah looks good - if we're adding code, let's make sure it's covered
pandas/_libs/tslibs/offsets.pyx
Outdated
raise NotImplementedError( | ||
"Using DateOffset to replace `millisecond` component in " | ||
"datetime object is not supported. Use " | ||
"`microsecond=timestamp.microsecond % 1000 + ms * 1000` " | ||
"instead." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
granted, there wasn't a test which hit this to begin with, but there should probably be one? could we add one please (either here or in a separate PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found these 2, not sure if they qualify as valid test
pandas/pandas/tests/tseries/offsets/test_offsets.py
Lines 930 to 944 in 8eb83d0
@pytest.mark.parametrize("kwd", sorted(liboffsets._relativedelta_kwds)) | |
def test_valid_relativedelta_kwargs(kwd, request): | |
if kwd == "millisecond": | |
request.node.add_marker( | |
pytest.mark.xfail( | |
raises=NotImplementedError, | |
reason="Constructing DateOffset object with `millisecond` is not " | |
"yet supported.", | |
) | |
) | |
# Check that all the arguments specified in liboffsets._relativedelta_kwds | |
# are in fact valid relativedelta keyword args | |
DateOffset(**{kwd: 1}) | |
pandas/pandas/tests/tseries/offsets/test_offsets.py
Lines 616 to 629 in 8eb83d0
@pytest.mark.parametrize("kwd", sorted(liboffsets._relativedelta_kwds)) | |
def test_constructor(self, kwd, request): | |
if kwd == "millisecond": | |
request.node.add_marker( | |
pytest.mark.xfail( | |
raises=NotImplementedError, | |
reason="Constructing DateOffset object with `millisecond` is not " | |
"yet supported.", | |
) | |
) | |
offset = DateOffset(**{kwd: 2}) | |
assert offset.kwds == {kwd: 2} | |
assert getattr(offset, kwd) == 2 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, thanks! all good then
pandas/_libs/tslibs/offsets.pyx
Outdated
|
||
kwds_no_nanos = {k: v for k, v in kwds.items() if k not in nanos} | ||
|
||
if "millisecond" in kwds_no_nanos: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could happen earlier, right? and be simplified to if "millisecond" in kwds
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice, thanks @markopacak
Thanks @markopacak |
pd.DateOffset
only handlesmilliseconds
keyword argument if other keyword args are shorter thanday/days
#49897doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.