Closed
Description
The constructor(s) for DatetimeArray are a bit messy right now, so let's step back a bit to lay out what we want out of them.
What do we want out of our init? I'd like the following constraints:
data
is never copied unless explicitly requested withcopy=True
. The values indata
are never coerced. This means no lists (copy), and no ndarrays of values that can be coerced to datetime64[ns] (no object-dtype strings, Timestamps, etc.). We do allow unboxing data from a Series / Index / DatetimeArray, and we do allow viewing i8 data as M8[ns].- The signature matches across all DTA classes:
values, dtype, freq, copy
- It's fast. There are two wrinkles here
a.) I didn't (and many users probably don't) appreciate the performance impact of passingfreq=
to DTI / DTA. (ballpark: 5x slower for creating). Everything else is relatively cheap to check, the most expensive thing is probably timezone normalization which I think is unavoidable.
b.) Frequency inference. Right now it's disallowed. Should we allow it? Is this expensive?
If possible, I'd prefer to avoid defining DatetimeArray.__new__
, for two main reasons
- Maintainability: defining
__new__
complicates pickle, which makes for relatively difficult debugging sessions in the future - Aesthetics: Python already has a way for initializing classes (
__init__
), so all else equal I'd prefer to use that instead of__new__
+_simple_new
Some concretish TODOs:
- Investigate validation-checking code between
DatetimeArray.__init__
andsequence_to_dt64ns
(checking user-provided freq / dtype / tz vs. those properties on DatetimeArrayvalues
) - Implement
freq
validation (blocked by
Bad freq invalidation in DatetimeIndex.where #24555 and maybe
Refactor DatetimeArray._generate_range #24562) - Standardize
DatetimeArray._simple_new
and the__init__
. Right now_simple_new
takes_simple_new(cls, values, freq=None, tz=None)
. Changing thattz
todtype
should lets use share more code between TDA/DTA/PeriodArray.