Skip to content

PERF: making DatetimeIndex.date more performant #18058

Closed
@jreback

Description

@jreback

We can substantially speed up DatetimeIndex.date with a small tweak in the code, from SO

In [44]: rng = pd.date_range('2000-04-03', periods=200000, freq='2H')

In [45]: %timeit rng.date
480 ms ± 12.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [46]: %timeit rng.normalize().to_pydatetime()
94.7 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [47]: rng.normalize().to_pydatetime()
Out[47]: 
array([datetime.datetime(2000, 4, 3, 0, 0),
       datetime.datetime(2000, 4, 3, 0, 0),
       datetime.datetime(2000, 4, 3, 0, 0), ...,
       datetime.datetime(2045, 11, 19, 0, 0),
       datetime.datetime(2045, 11, 19, 0, 0),
       datetime.datetime(2045, 11, 19, 0, 0)], dtype=object)

In [48]: rng.date
Out[48]: 
array([datetime.date(2000, 4, 3), datetime.date(2000, 4, 3),
       datetime.date(2000, 4, 3), ..., datetime.date(2045, 11, 19),
       datetime.date(2045, 11, 19), datetime.date(2045, 11, 19)], dtype=object)

so [47] and [48] are almost the same, the difference is datetime for [47] and date for [48].

If we allowed ints_to_pydatetime to create date objects (just needs a simple function pointer) around https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslib.pyx#L140, then this would work, IOW

@property
def date(self):
     return self.normalize().to_pydate()

where .to_pydate() is basically .to_pydatetime() but adding an additional arg, say kind='date', which ints_to_pydatetime would handle (and create date rather than datetime).

This bypasses the iteration which creates many python objects.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypePerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions