Skip to content

ENH: IntervalIndex as groups in groupby #37949

Open
@twoertwein

Description

@twoertwein

Is your feature request related to a problem?

I often have a list of (variable-size) intervals and I want to aggregate multiple statistics for these intervals.

import numpy as np
import pandas as pd

data = pd.DataFrame({'a': [1]*100, 'b': [2]*100}, index=np.arange(0.0, 10, 0.1))
starts = [0.0, 2.5, 5, 7]
ends = [0.7, 3.8, 6.1, 9.5]

means = []
for start, end in zip(starts, ends):
    means.append(data.loc[start:end, :].mean())

Describe the solution you'd like

It would be cool (and probably faster) to do:

groups = pd.IntervalIndex.from_arrays(starts, ends)
data.groupby(groups).mean()  # *** ValueError: Grouper and axis must be same length
# or for multiple statistics
data.groupby(groups).aggregate(['mean', 'sum', lambda x: x.quantile(0.75) - x.quantile(0.25)])

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions