Skip to content

PERF: Indexing a multi-index is a lot slower #31648

Closed
@valtron

Description

@valtron

Indexing a multi-index seemingly went from O(1) to O(N):

bug

I did a bisect, and found this was caused by the _shallow_copy here: b0f33b3#diff-4ffd1c69d47e0ac9f2de4f9e3e4a118cR643.

Code Sample

from time import perf_counter as time
import pandas as pd

for N in [1000, 2000, 4000, 8000, 16000, 32000]:
	values = list(range(N))
	df = pd.DataFrame({ 'a': values })
	df['b'] = 1
	df.set_index(['a', 'b'], inplace = True)

	t = time()
	df.loc[values]
	t = time() - t
	print(N, t)

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndexPerformanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions