Skip to content

PERF: regression in reindex. Pandas 0.23.4 is 60x slower than 0.22.0 with a MultiIndex with datetime64 #23735

@apm582

Description

@apm582

Re-indexing a series with a two-level MultiIndex where the first level is datetime64 and the second level is int is 40x slower than in 0.22.0. Output first then repro code below. The issue persists if you change the first level to int instead of datetime, but the perf difference is less (0.40 seconds vs 0.03 seconds).

"""
pandas version: 0.23.4
reindex took 1.9770500659942627 seconds

pandas version: 0.22.0
reindex took 0.0306899547577 seconds
"""


import pandas as pd
import time
import numpy as np


if __name__ == '__main__':
    n_days = 300
    dr = pd.date_range(end="20181118", periods=n_days)
    mi = pd.MultiIndex.from_product([dr, range(1440)])

    v = np.random.randn(len(mi))
    mask = np.random.rand(len(v)) < .3
    v[mask] = np.nan
    s = pd.Series(v, index=mi)
    s = s.sort_index()

    s2 = s.dropna()

    start = time.time()

    s2.reindex(index=s.index)

    end = time.time()
    print("pandas version: %s" % pd.__version__)
    print("reindex took %s seconds" % (end - start))

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closing CandidateMay be closeable, needs more eyeballsPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions