Skip to content

PERF: regression in MultiIndex get_loc indexing performance #29311

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

From our benchmarks (GetLoc class in https://github.com/pandas-dev/pandas/blob/master/asv_bench/benchmarks/multiindex_object.py):

In [65]: mi_med = pd.MultiIndex.from_product( 
    ...:     [np.arange(1000), np.arange(10), list("A")], names=["one", "two", "three"] 
    ...: ) 

In [66]: %timeit mi_med.get_loc((999, 9, "A"))   
9.58 µs ± 106 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [67]: pd.__version__  
Out[67]: '0.25.0'

vs

In [18]: mi_med = pd.MultiIndex.from_product( 
    ...:     [np.arange(1000), np.arange(10), list("A")], names=["one", "two", "three"] 
    ...: )

In [19]: %timeit mi_med.get_loc((999, 9, "A"))  
34.7 µs ± 454 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [20]: pd.__version__
Out[20]: '0.26.0.dev0+691.g157495696'

Not directly sure what recently changed related to MultiIndexes.
And the benchmark suite is giving a rather broad range: 2b28454...7c8c8c8 (because the benchmarks were not running for a while I suppose)

Metadata

Metadata

Assignees

No one assigned

    Labels

    MultiIndexPerformanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions