Skip to content

Datetime functionality #260

Closed
Closed
@MarcoGorelli

Description

@MarcoGorelli

I like how pandas/polars have a .dt namespace for datetime functionality, I'd suggest having that too. What do we put in it?

Looking at skrub, here's some datetime functionality we should add.

  • year
  • month
  • day
  • hour
  • minute
  • second
  • millisecond
  • microsecond
  • nanosecond
  • iso_weekday (monday = 1, sunday = 7)
  • timestamp (number of seconds since 1970-01-01 UTC)

These should all be fairly trivial. They also use floor, but I think we should keep that one out initially, as it's really non-trivial in the timezone-aware case when there's DST. Pretty sure there's a way around it for what they're doing anyway

There's some inconsistency in the definitions here:

  • pandas: microsecond returns the number of microseconds since the last second, but 'nanosecond' returns the number of nanoseconds since the last microsecond. there is no millisecond
  • polars (and chrono): nanosecond returns the number of nanoseconds since the last second. similarly for millisecond and microsecond

I'd suggest we only include microsecond as part of the Standard, which they all agree on

Also, I'd suggest making all these functions rather than properties. I think pandas is really misleading here:

In [71]: ts = pd.date_range('1900-01-01', '2100-01-01', freq='1min')

In [72]: %time ts.nanosecond  # definitely not "free"
CPU times: user 2.1 s, sys: 26 ms, total: 2.13 s
Wall time: 2.13 s
Out[72]:
Index([0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       ...
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
      dtype='int32', length=105190561)

Metadata

Metadata

Assignees

No one assigned

    Labels

    API designtimeseriesrelated to dates / datetimes / times / durations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions