Closed
Description
We can substantially speed up DatetimeIndex.date
with a small tweak in the code, from SO
In [44]: rng = pd.date_range('2000-04-03', periods=200000, freq='2H')
In [45]: %timeit rng.date
480 ms ± 12.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [46]: %timeit rng.normalize().to_pydatetime()
94.7 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [47]: rng.normalize().to_pydatetime()
Out[47]:
array([datetime.datetime(2000, 4, 3, 0, 0),
datetime.datetime(2000, 4, 3, 0, 0),
datetime.datetime(2000, 4, 3, 0, 0), ...,
datetime.datetime(2045, 11, 19, 0, 0),
datetime.datetime(2045, 11, 19, 0, 0),
datetime.datetime(2045, 11, 19, 0, 0)], dtype=object)
In [48]: rng.date
Out[48]:
array([datetime.date(2000, 4, 3), datetime.date(2000, 4, 3),
datetime.date(2000, 4, 3), ..., datetime.date(2045, 11, 19),
datetime.date(2045, 11, 19), datetime.date(2045, 11, 19)], dtype=object)
so [47] and [48] are almost the same, the difference is datetime
for [47] and date
for [48].
If we allowed ints_to_pydatetime
to create date
objects (just needs a simple function pointer) around https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslib.pyx#L140, then this would work, IOW
@property
def date(self):
return self.normalize().to_pydate()
where .to_pydate()
is basically .to_pydatetime()
but adding an additional arg, say kind='date'
, which ints_to_pydatetime
would handle (and create date
rather than datetime
).
This bypasses the iteration which creates many python objects.