Closed
Description
I like how pandas/polars have a .dt
namespace for datetime functionality, I'd suggest having that too. What do we put in it?
Looking at skrub, here's some datetime functionality we should add.
- year
- month
- day
- hour
- minute
- second
- millisecond
- microsecond
- nanosecond
- iso_weekday (monday = 1, sunday = 7)
- timestamp (number of seconds since 1970-01-01 UTC)
These should all be fairly trivial. They also use floor
, but I think we should keep that one out initially, as it's really non-trivial in the timezone-aware case when there's DST. Pretty sure there's a way around it for what they're doing anyway
There's some inconsistency in the definitions here:
- pandas:
microsecond
returns the number of microseconds since the last second, but'nanosecond'
returns the number of nanoseconds since the last microsecond. there is nomillisecond
- polars (and chrono):
nanosecond
returns the number of nanoseconds since the last second. similarly formillisecond
andmicrosecond
I'd suggest we only include microsecond
as part of the Standard, which they all agree on
Also, I'd suggest making all these functions rather than properties. I think pandas is really misleading here:
In [71]: ts = pd.date_range('1900-01-01', '2100-01-01', freq='1min')
In [72]: %time ts.nanosecond # definitely not "free"
CPU times: user 2.1 s, sys: 26 ms, total: 2.13 s
Wall time: 2.13 s
Out[72]:
Index([0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
...
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
dtype='int32', length=105190561)