Skip to content

BUG: pd.DateOffset handle milliseconds #50020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Dec 22, 2022

Conversation

markopacak
Copy link
Contributor

@markopacak markopacak commented Dec 2, 2022

@markopacak markopacak marked this pull request as ready for review December 3, 2022 12:49
@markopacak
Copy link
Contributor Author

IMO after this fix, _determine_offset needs some rework. The function has been continuously patched over-time and has become unreadable.

I'm testing some implementations in a separate branch

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR. looks like a merge conflict

@@ -314,6 +314,9 @@ cdef _determine_offset(kwds):

use_relativedelta = False
if len(kwds_no_nanos) > 0:
if "milliseconds" in kwds_no_nanos:
return timedelta(**kwds_no_nanos), use_relativedelta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you assign to offset and use_relativedelta and just let the function return naturally? I think flows better with the other branching

@@ -739,6 +739,10 @@ def test_eq(self):

assert DateOffset(milliseconds=3) != DateOffset(milliseconds=7)

def test_milliseconds_combination(self):
# 10525
DateOffset(days=1, milliseconds=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an assertion that the object has been properly constructed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@mroeschke mroeschke added the Frequency DateOffsets label Dec 8, 2022
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. @jbrockmendel or @MarcoGorelli care to look?

@@ -314,7 +314,9 @@ cdef _determine_offset(kwds):

use_relativedelta = False
if len(kwds_no_nanos) > 0:
if any(k in _kwds_use_relativedelta for k in kwds_no_nanos):
if "milliseconds" in kwds_no_nanos:
offset = timedelta(**kwds_no_nanos)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would still fail if a user passed DateOffset(milliseconds=1, month=1) i.e. any non-timedelta keyword constructor was passed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it fails. What should be done in cases like this? Both timedelta and relativedelta will throw an error.

TBH I don't know how to fix this bug without trying to rework the entire function. Does it make sense to broaden up the scope of this issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be okay to broaden the scope if all the existing tests pass. Generally I would imagine there should be checking that kwds (besides nanoseconds) should either all be valid for timedelta or relativetimedelta

Copy link
Contributor Author

@markopacak markopacak Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I had in mind.

All tests in tests/arithmetic/test_datetime64.py and tests/tseries/offsets/test_offsets.py pass.

Added one parametrized test.

DateOffset(milliseconds=1, month=1) fails because I don't know what should happen in this case (see last return stmt in first link)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just open another PR with the rework linked above and close this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine to just update this PR with your proposed refactor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we do something like

if 'milliseconds' in kwds:
     if "microseconds in kwds:
           raise ...
     kwds['microseconds'] = kwds.pop('milliseconds') * 1000

(and something similar in the if "millisecond" in kwds_no_nanos case below)

Copy link
Contributor Author

@markopacak markopacak Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this code instead of the exception block (full code here) :

if "millisecond" in kwds_no_nanos:
    micro = kwds_no_nanos.pop("millisecond") * 1000
    kwds_no_nanos["microsecond"] = kwds_no_nanos.get("microsecond", 0) + micro

if "milliseconds" in kwds_no_nanos:
    micro = kwds_no_nanos.pop("milliseconds") * 1000
    kwds_no_nanos["microseconds"] = kwds_no_nanos.get("microseconds", 0) + micro

but it fails 2 tests in pandas/tests/tseries/offsets/test_offsets.py, both of which are meant to intentionally fail if we pass millisecond. So we aren't breaking any real functionality, I guess.

TestDateOffset.test_constructor[millisecond] 
[XPASS(strict)] Constructing DateOffset object with `millisecond` is not yet supported.

test_valid_relativedelta_kwargs[millisecond] 
[XPASS(strict)] Constructing DateOffset object with `millisecond` is not yet supported.

However, I think the requirement is to have all the existing tests pass.

There is still the case where we receive minutes (arg valid for timedelta) + another arg valid for relativedelta only.
In general, the last return needs to be discussed: What do we do in "all other cases"?

@MarcoGorelli MarcoGorelli self-requested a review December 8, 2022 19:03
# This also handles "milliseconds" (plur): see GH 49897
return timedelta(**kwds_no_nanos), False

# TODO what to do in this case?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine this should raise since there are mixed relativedelta and timedelta based kwds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id have to double-check but i think pytimedelta keywords are also valid relativedelta keywords

Copy link
Contributor Author

@markopacak markopacak Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All except milliseconds.
And for some reason, minutes is not present in _kwds_use_relativedelta, despite being a valid relativedelta arg.
It breaks tests if I add it

With our current configuration, we end up in this last return stmt, when:

  • minutes + another relativedelta arg
  • milliseconds + another relativedelta arg
  • any invalid arg for either relativedelta or timedelta

I think it makes sense to raise, e.g: Invalid keyword configuration.

Edit: fixed the above

@markopacak
Copy link
Contributor Author

markopacak commented Dec 19, 2022

I made it:

  • milliseconds now work in combination with sub-/above-daily args (see tests)
  • code is cleaned
  • no existing tests are broken

It was tough to maintain compatibility with older tests. If new issues about this or related modules come up, feel free to tag me.

@MarcoGorelli
Copy link
Member

thanks for working on this, I'll take a look soon

looks like CI didn't run for some reason, could you resolve the merge conflicts and push please? that should restart it

@pytest.mark.parametrize(
"offset_kwargs, expected_arg",
[
({"seconds": 1, "milliseconds": 1}, "2022-01-01 00:00:01.001"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a test to check when both milliseconds and microseconds were passed, seeing as you sum them together?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(CI failures are unrelated BTW, if you fetch and merge upstream/main they should be fixed)

"seconds", "microseconds", "milliseconds", "minutes", "hours",
}

# check timedelta before relativedelta: maintain compatibility with old tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"maintain compatibility with old tests" is usually a given, not sure it needs a comment 😄

@@ -794,6 +794,7 @@ Datetimelike
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing empty string and non-ISO8601 format was passed. Now, empty strings will be parsed as :class:`NaT`, for compatibility with how is done for ISO8601 formats (:issue:`50251`)
- Bug in :class:`Timestamp` was showing ``UserWarning``, which was not actionable by users, when parsing non-ISO8601 delimited date strings (:issue:`50232`)
- Bug in :func:`to_datetime` was showing misleading ``ValueError`` when parsing dates with format containing ISO week directive and ISO weekday directive (:issue:`50308`)
- Bug in :class:`DateOffset` when constructing with milliseconds in combination with another argument (:issue:`49897`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be more specific? like Bug in :class:DateOffset` was throwing ``TypeError```..., or whatever it was

…andas into bug-dateoffset-milliseconds

# Conflicts:
#	doc/source/whatsnew/v2.0.0.rst
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, but the rest looks good!

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Sorry, just noticed a couple more things, but other than that, really solid work here, nice 💪 Much simpler to understand like this

Comment on lines +346 to 349
raise ValueError(
f"Invalid argument/s or bad combination of arguments: {list(kwds.keys())}"
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test that hits this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, ever since minutes was added to kwds_use_relativedelta, and milliseconds is converted to microseconds, the only case where you raise is for invalid args

So it would be something like:

def test_offset_invalid_arguments(self):
    with pytest.raises(ValueError, match="^Invalid argument/s or bad combination of arguments"):
        DateOffset(picoseconds=1)  # or any other invalid arg

Does it make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah looks good - if we're adding code, let's make sure it's covered

Comment on lines 314 to 319
raise NotImplementedError(
"Using DateOffset to replace `millisecond` component in "
"datetime object is not supported. Use "
"`microsecond=timestamp.microsecond % 1000 + ms * 1000` "
"instead."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

granted, there wasn't a test which hit this to begin with, but there should probably be one? could we add one please (either here or in a separate PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found these 2, not sure if they qualify as valid test

@pytest.mark.parametrize("kwd", sorted(liboffsets._relativedelta_kwds))
def test_valid_relativedelta_kwargs(kwd, request):
if kwd == "millisecond":
request.node.add_marker(
pytest.mark.xfail(
raises=NotImplementedError,
reason="Constructing DateOffset object with `millisecond` is not "
"yet supported.",
)
)
# Check that all the arguments specified in liboffsets._relativedelta_kwds
# are in fact valid relativedelta keyword args
DateOffset(**{kwd: 1})

@pytest.mark.parametrize("kwd", sorted(liboffsets._relativedelta_kwds))
def test_constructor(self, kwd, request):
if kwd == "millisecond":
request.node.add_marker(
pytest.mark.xfail(
raises=NotImplementedError,
reason="Constructing DateOffset object with `millisecond` is not "
"yet supported.",
)
)
offset = DateOffset(**{kwd: 2})
assert offset.kwds == {kwd: 2}
assert getattr(offset, kwd) == 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, thanks! all good then


kwds_no_nanos = {k: v for k, v in kwds.items() if k not in nanos}

if "millisecond" in kwds_no_nanos:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could happen earlier, right? and be simplified to if "millisecond" in kwds?

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, thanks @markopacak

@mroeschke mroeschke merged commit 7bc699f into pandas-dev:main Dec 22, 2022
@mroeschke
Copy link
Member

Thanks @markopacak

@markopacak markopacak deleted the bug-dateoffset-milliseconds branch January 22, 2023 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.DateOffset only handles milliseconds keyword argument if other keyword args are shorter than day/days
5 participants