Skip to content

pd.Timestamp constructor ignores missing arguments #31930

Open
@pganssle

Description

@pganssle

As part of the discussions in #31563, I came across these strange semantics in pd.Timestamp, where it is apparently legal to over-specify a pd.Timestamp by specifying both a datetime (or another Timestamp) and pass the by-component construction values, and any irrelevant arguments are ignored:

>>> pd.Timestamp(datetime(2020, 12, 31),
                 year=1, month=1, day=1,
                 hour=23, minute=59, second=59, microsecond=999999)
Timestamp('2020-12-31 00:00:00')

The signature for the function is:

pd.Timestamp(
    ts_input=<object object at 0x7fd988a10760>,
    freq=None,
    tz=None,
    unit=None,
    year=None,
    month=None,
    day=None,
    hour=None,
    minute=None,
    second=None,
    microsecond=None,
    nanosecond=None,
    tzinfo=None,
)

There's actually a decent amount of redundant information in there, because pd.Timestamp is attempting to have its own constructor *in addition to being constructable like a datetime. Properly, there are two overloaded constructors here (note that I'm not sure if nanosecond belongs to both or just one):

pd.Timestamp(ts_input, freq, tz, unit[, nanosecond?])
pd.Timestamp(year, month, day
             [, hour, minute, second, microsecond, nanosecond, tzinfo])

I think that ideally the correct behavior would be to throw an error if you mix and match between the two, which is at least done in the case of specifying both tz and tzinfo:

>>> pd.Timestamp(datetime.now(), tz=timezone.utc, tzinfo=timezone.utc)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-d80c9ce6a89d> in <module>
----> 1 pd.Timestamp(datetime.now(), tz=timezone.utc, tzinfo=timezone.utc)

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

ValueError: Can provide at most one of tz, tzinfo

Though confusingly this also fails if you specify tzinfo at all in the "by-component" constructor. I have filed a separate bug for that at #31929.

Recommendation

I think that the behavior of pandas.Timestamp should probably be brought at least mostly in-line with the concept of two overloaded constructors (possibly with tz and tzinfo being mutually-exclusive aliases for one another). Any other combination, particularly combinations where the values passed are ignored, should raise an exception.

This may be a breaking change, since it will start raising exceptions in code that didn't raise exceptions before (though I am not sure I can think of any situation where silently ignoring the values is a desirable condition), so it may be a good idea to have a deprecation period where a warning rather than an exception is raised.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeError ReportingIncorrect or improved errors from pandas

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions