Skip to content

PERF: GroupBy.quantile #51385

Closed
Closed
@jbrockmendel

Description

@jbrockmendel
import numpy as np
import pandas as pd

nrows = 10**7
ncols=10
ngroups = 6

qs = [0.5, 0.75]
arr = np.random.randn(nrows, ncols)
df = pd.DataFrame(arr)
df["A"] = np.random.randint(ngroups, size=nrows)

gb = df.groupby("A")

%timeit v1 = gb.quantile(qs)
39.6 s ± 1.74 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit v2 = {key: gb.get_group(key).quantile(qs) for key in gb.groups}
3.37 s ± 316 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

alt = pd.concat(v2).drop("A", axis=1)
alt.index.names = ["A", None]
assert alt.equals(v1)

Is GroupBy.quantile doing dramatically too much work?

Metadata

Metadata

Assignees

No one assigned

    Labels

    GroupbyPerformanceMemory or execution speed performancequantilequantile method

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions