Skip to content

PERF: investigate numpy's percentile implementation #55535

Open
@jbrockmendel

Description

@jbrockmendel

When doing profiling for #51722 I found a number of cases where operating group-by-group performed better than our cython implementation. The group-by-group iteration is expensive, which suggests that the non-iteration portion of that call must be performant. That would go through DataFrame.quantile, which would go through np.percentile (in core.array_algos.quantile). This suggests that the np.percentile implementation may be doing something that we should try to port to group_quantile.

Copy/pasting from my notes-to-self at the time

- Investigate numpy's percentile code
	- Our nanmedian does casting and type inference in a way I think is unnecessary
	- Profiling groupby.quantile (xref https://github.com/pandas-dev/pandas/pull/51722) suggests that numpy's percentile may just be much more performant than what we have
	- https://github.com/numpy/numpy/blob/v1.24.0/numpy/lib/function_base.py#L3920-L4206
	- https://github.com/numpy/numpy/blob/v1.24.0/numpy/lib/function_base.py#L3774-L3857

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions