ENH: `DataFrame.describe` allows UDFs and/or selectable metrics

`DataFrame.describe()` generates descriptive statistics for columns in a DataFrame. The "descriptive statistics" have been specifically chosen and hard-coded, and are also somewhat dtype dependent.

I find it quite odd that the function has a lot of customisation for which columns to `include` or to `exclude` based on dtype, or which `percentiles` to sample, but doesn't offer the ability to chain a set of predefined functions that the user might want to see (or not see: such as percentiles).

A rough idea is to propose a new argument, e.g. `metrics`, which overwrites and defines the metrics to a specifc set of functions, defined as str or callable:

```python
def udf_name(s):
    return s.sum()

df = DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
df.describe(metrics=["sum", Series.mean, lambda s: s.count(), lambda s: s.dtype, udf_name])
              A        B
sum           4        6       
mean          2        3
<lambda>      2        2
<lambda>  int64    int64
udf_name      4        6
```

Note this came up in the context of trying to add sub-total or additional rows to a Styler, based on the underlying data (#43894), and following issues:

 - #43875
 - #44751


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: `DataFrame.describe` allows UDFs and/or selectable metrics #45737

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: DataFrame.describe allows UDFs and/or selectable metrics #45737

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

ENH: `DataFrame.describe` allows UDFs and/or selectable metrics #45737