Closed
Description
import numpy as np
import pandas as pd
nrows = 10**7
ncols=10
ngroups = 6
qs = [0.5, 0.75]
arr = np.random.randn(nrows, ncols)
df = pd.DataFrame(arr)
df["A"] = np.random.randint(ngroups, size=nrows)
gb = df.groupby("A")
%timeit v1 = gb.quantile(qs)
39.6 s ± 1.74 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit v2 = {key: gb.get_group(key).quantile(qs) for key in gb.groups}
3.37 s ± 316 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
alt = pd.concat(v2).drop("A", axis=1)
alt.index.names = ["A", None]
assert alt.equals(v1)
Is GroupBy.quantile doing dramatically too much work?