Skip to content

numpy 1.11.0 nonuniform binning #14627

Closed
Closed
@blalterman

Description

@blalterman

Binning continuous data into discrete bins is a non-trivial problem, especially when there are features of various scales that one is trying to capture. This astropy example is much more clear than any I could hope to write. The current implementation of pandas.cut does not allow for easy user access to these tools.

numpy provides a number of simple binning algorithms to capture these features in computationally efficient ways. As done in astropy, perhaps we can pass the bin edge calculations directly to numpy. To do so, we would need to call histogram_bin_edges (available in numpy 1.15.1, if not earler). This function isolates the calculation of bin edges.

To utilize np.histogram_bin_edges, a user would call

data    = np.random.normal(100000)
edges = np.histogram_bin_edges(data, bins="auto") # Other strings are acceptable.
cut      = pd.cut(data, bins=edges)

We can simplify this by replacing one or two lines in pandas.cut source

bins = np.linspace(mn, mx, bins + 1, endpoint=True)

becomes

bins = np.histogram_bin_edges(x, bins=bins)

so that the user can simply call

data    = np.random.normal(100000)
cut      = pd.cut(data, bins="auto") # Or any of the other strings accepted by numpy

and have access to all the techniques available there.

The histogram_bin_edges docs are availble here. The source is here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions