Skip to content

DatetimeArray constructors #24567

Closed
Closed
@TomAugspurger

Description

@TomAugspurger

The constructor(s) for DatetimeArray are a bit messy right now, so let's step back a bit to lay out what we want out of them.

What do we want out of our init? I'd like the following constraints:

  1. data is never copied unless explicitly requested with copy=True. The values in data are never coerced. This means no lists (copy), and no ndarrays of values that can be coerced to datetime64[ns] (no object-dtype strings, Timestamps, etc.). We do allow unboxing data from a Series / Index / DatetimeArray, and we do allow viewing i8 data as M8[ns].
  2. The signature matches across all DTA classes: values, dtype, freq, copy
  3. It's fast. There are two wrinkles here
    a.) I didn't (and many users probably don't) appreciate the performance impact of passing freq= to DTI / DTA. (ballpark: 5x slower for creating). Everything else is relatively cheap to check, the most expensive thing is probably timezone normalization which I think is unavoidable.
    b.) Frequency inference. Right now it's disallowed. Should we allow it? Is this expensive?

If possible, I'd prefer to avoid defining DatetimeArray.__new__, for two main reasons

  1. Maintainability: defining __new__ complicates pickle, which makes for relatively difficult debugging sessions in the future
  2. Aesthetics: Python already has a way for initializing classes (__init__), so all else equal I'd prefer to use that instead of __new__ + _simple_new

Some concretish TODOs:

  1. Investigate validation-checking code between DatetimeArray.__init__ and sequence_to_dt64ns (checking user-provided freq / dtype / tz vs. those properties on DatetimeArray values)
  2. Implement freq validation (blocked by
    Bad freq invalidation in DatetimeIndex.where #24555 and maybe
    Refactor DatetimeArray._generate_range #24562)
  3. Standardize DatetimeArray._simple_new and the __init__. Right now _simple_new takes _simple_new(cls, values, freq=None, tz=None). Changing that tz to dtype should lets use share more code between TDA/DTA/PeriodArray.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignClosing CandidateMay be closeable, needs more eyeballsConstructorsSeries/DataFrame/Index/pd.array ConstructorsDatetimeDatetime data dtypeEnhancementNeeds DiscussionRequires discussion from core team before further actionPeriodPeriod data typeRefactorInternal refactoring of codeTimedeltaTimedelta data type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions