Skip to content

PERF: Slowness in multi-level indexes with datetime levels #8543

Closed
@miketkelly

Description

@miketkelly

A MultiIndex with a DatetimeIndex level is slower than a similar index with numeric levels:

lev1 = range(10000)
lev2 = range(100)
mi = pd.MultiIndex.from_product([lev1, lev2])
%time mi.values

CPU times: user 571 ms, sys: 41 ms, total: 612 ms
Wall time: 612 ms

lev1 = range(10000)
lev2 = pd.date_range('1/1/2014', periods=100)
mi = pd.MultiIndex.from_product([lev1, lev2])
%time mi.values

CPU times: user 2.51 s, sys: 68 ms, total: 2.58 s
Wall time: 2.58 s

The overhead is in boxing the level values when generating the tuples for the values property. The overhead can be minimized if we do the boxing once for each distinct value rather than for each occurrence of that value in the tuples.

I can send in a PR shortly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions