Closed
Description
A MultiIndex with a DatetimeIndex level is slower than a similar index with numeric levels:
lev1 = range(10000)
lev2 = range(100)
mi = pd.MultiIndex.from_product([lev1, lev2])
%time mi.values
CPU times: user 571 ms, sys: 41 ms, total: 612 ms
Wall time: 612 ms
lev1 = range(10000)
lev2 = pd.date_range('1/1/2014', periods=100)
mi = pd.MultiIndex.from_product([lev1, lev2])
%time mi.values
CPU times: user 2.51 s, sys: 68 ms, total: 2.58 s
Wall time: 2.58 s
The overhead is in boxing the level values when generating the tuples for the values
property. The overhead can be minimized if we do the boxing once for each distinct value rather than for each occurrence of that value in the tuples.
I can send in a PR shortly.