Description
DataFrame.describe()
generates descriptive statistics for columns in a DataFrame. The "descriptive statistics" have been specifically chosen and hard-coded, and are also somewhat dtype dependent.
I find it quite odd that the function has a lot of customisation for which columns to include
or to exclude
based on dtype, or which percentiles
to sample, but doesn't offer the ability to chain a set of predefined functions that the user might want to see (or not see: such as percentiles).
A rough idea is to propose a new argument, e.g. metrics
, which overwrites and defines the metrics to a specifc set of functions, defined as str or callable:
def udf_name(s):
return s.sum()
df = DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
df.describe(metrics=["sum", Series.mean, lambda s: s.count(), lambda s: s.dtype, udf_name])
A B
sum 4 6
mean 2 3
<lambda> 2 2
<lambda> int64 int64
udf_name 4 6
Note this came up in the context of trying to add sub-total or additional rows to a Styler, based on the underlying data (#43894), and following issues: