Skip to content

Commit 9e04bc0

Browse files
Central moments (#23)
* Added definition of central order moment * Added stubbed implementation * Add behaviour to return None if the array is empty * Handle edge cases * Added test * First implementation completed * Added test for second order moment (variance) * Implemented test and renamed to central_moment * Using Horner's method for evaluating polynomials * Add algorithm notes to the docs * Added algorithmic complexity to the docs * Added two optimizations * Added signature for bulk central moment computation * Fixed trait signature * Indent * Factored out logic from central_moment * Implemented central_moments * Test implementation correctness * Added skewness and kurtosis * Added docs * Added tests for kurtosis and skewness * Fixed assertions * No need to get to order 4 for skewness * Fixed kurtosis test * Enriched docs * Fmt * Avoid computing mean for p=0,1 central moments * Replace map + clone with mapv * Check for moment order overflowing `i32` In reality, this probably isn't necessary because I'd expect someone to terminate their program when trying to calculate a moment of order greater than `i32::MAX` (because it would take so long), but it's nice to have an explicit check anyway. * Take advantage of IterBinomial from num-integer * Remove unnecessary explicit clones `A: Float` requires `A: Copy`. * Replace .map() with ? I think this is a bit easier to understand at first glance. * Test more cases in central moment tests * Rename central moment tests * Push diff to the lowest possible exponent * Return a Result instead of Option, for consistency * Match impl with trait definition * Match test to trait+impl * Fmt * Converted order to u16 * Fmt * Fix typo. Change casting to use as. * Fix typo. Change casting to use as. * Fix typo. Change casting to use as.
1 parent 86e5ca4 commit 9e04bc0

File tree

4 files changed

+335
-7
lines changed

4 files changed

+335
-7
lines changed

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ categories = ["data-structures", "science"]
1717
[dependencies]
1818
ndarray = "0.12.1"
1919
noisy_float = "0.1.8"
20+
num-integer = "0.1"
2021
num-traits = "0.2"
2122
rand = "0.6"
2223
itertools = { version = "0.7.0", default-features = false }

src/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
extern crate itertools;
2929
extern crate ndarray;
3030
extern crate noisy_float;
31+
extern crate num_integer;
3132
extern crate num_traits;
3233
extern crate rand;
3334

src/summary_statistics/means.rs

Lines changed: 254 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
use super::SummaryStatisticsExt;
22
use crate::errors::EmptyInput;
33
use ndarray::{ArrayBase, Data, Dimension};
4+
use num_integer::IterBinomial;
45
use num_traits::{Float, FromPrimitive, Zero};
56
use std::ops::{Add, Div};
67

@@ -36,15 +37,161 @@ where
3637
{
3738
self.map(|x| x.ln()).mean().map(|x| x.exp())
3839
}
40+
41+
fn kurtosis(&self) -> Result<A, EmptyInput>
42+
where
43+
A: Float + FromPrimitive,
44+
{
45+
let central_moments = self.central_moments(4)?;
46+
Ok(central_moments[4] / central_moments[2].powi(2))
47+
}
48+
49+
fn skewness(&self) -> Result<A, EmptyInput>
50+
where
51+
A: Float + FromPrimitive,
52+
{
53+
let central_moments = self.central_moments(3)?;
54+
Ok(central_moments[3] / central_moments[2].sqrt().powi(3))
55+
}
56+
57+
fn central_moment(&self, order: u16) -> Result<A, EmptyInput>
58+
where
59+
A: Float + FromPrimitive,
60+
{
61+
if self.is_empty() {
62+
return Err(EmptyInput);
63+
}
64+
match order {
65+
0 => Ok(A::one()),
66+
1 => Ok(A::zero()),
67+
n => {
68+
let mean = self.mean().unwrap();
69+
let shifted_array = self.mapv(|x| x - mean);
70+
let shifted_moments = moments(shifted_array, n);
71+
let correction_term = -shifted_moments[1];
72+
73+
let coefficients = central_moment_coefficients(&shifted_moments);
74+
Ok(horner_method(coefficients, correction_term))
75+
}
76+
}
77+
}
78+
79+
fn central_moments(&self, order: u16) -> Result<Vec<A>, EmptyInput>
80+
where
81+
A: Float + FromPrimitive,
82+
{
83+
if self.is_empty() {
84+
return Err(EmptyInput);
85+
}
86+
match order {
87+
0 => Ok(vec![A::one()]),
88+
1 => Ok(vec![A::one(), A::zero()]),
89+
n => {
90+
// We only perform these operations once, and then reuse their
91+
// result to compute all the required moments
92+
let mean = self.mean().unwrap();
93+
let shifted_array = self.mapv(|x| x - mean);
94+
let shifted_moments = moments(shifted_array, n);
95+
let correction_term = -shifted_moments[1];
96+
97+
let mut central_moments = vec![A::one(), A::zero()];
98+
for k in 2..=n {
99+
let coefficients =
100+
central_moment_coefficients(&shifted_moments[..=(k as usize)]);
101+
let central_moment = horner_method(coefficients, correction_term);
102+
central_moments.push(central_moment)
103+
}
104+
Ok(central_moments)
105+
}
106+
}
107+
}
108+
}
109+
110+
/// Returns a vector containing all moments of the array elements up to
111+
/// *order*, where the *p*-th moment is defined as:
112+
///
113+
/// ```text
114+
/// 1 n
115+
/// ― ∑ xᵢᵖ
116+
/// n i=1
117+
/// ```
118+
///
119+
/// The returned moments are ordered by power magnitude: 0th moment, 1st moment, etc.
120+
///
121+
/// **Panics** if `A::from_usize()` fails to convert the number of elements in the array.
122+
fn moments<A, S, D>(a: ArrayBase<S, D>, order: u16) -> Vec<A>
123+
where
124+
A: Float + FromPrimitive,
125+
S: Data<Elem = A>,
126+
D: Dimension,
127+
{
128+
let n_elements =
129+
A::from_usize(a.len()).expect("Converting number of elements to `A` must not fail");
130+
let order = order as i32;
131+
132+
// When k=0, we are raising each element to the 0th power
133+
// No need to waste CPU cycles going through the array
134+
let mut moments = vec![A::one()];
135+
136+
if order >= 1 {
137+
// When k=1, we don't need to raise elements to the 1th power (identity)
138+
moments.push(a.sum() / n_elements)
139+
}
140+
141+
for k in 2..=order {
142+
moments.push(a.map(|x| x.powi(k)).sum() / n_elements)
143+
}
144+
moments
145+
}
146+
147+
/// Returns the coefficients in the polynomial expression to compute the *p*th
148+
/// central moment as a function of the sample mean.
149+
///
150+
/// It takes as input all moments up to order *p*, ordered by power magnitude - *p* is
151+
/// inferred to be the length of the *moments* array.
152+
fn central_moment_coefficients<A>(moments: &[A]) -> Vec<A>
153+
where
154+
A: Float + FromPrimitive,
155+
{
156+
let order = moments.len();
157+
IterBinomial::new(order)
158+
.zip(moments.iter().rev())
159+
.map(|(binom, &moment)| A::from_usize(binom).unwrap() * moment)
160+
.collect()
161+
}
162+
163+
/// Uses [Horner's method] to evaluate a polynomial with a single indeterminate.
164+
///
165+
/// Coefficients are expected to be sorted by ascending order
166+
/// with respect to the indeterminate's exponent.
167+
///
168+
/// If the array is empty, `A::zero()` is returned.
169+
///
170+
/// Horner's method can evaluate a polynomial of order *n* with a single indeterminate
171+
/// using only *n-1* multiplications and *n-1* sums - in terms of number of operations,
172+
/// this is an optimal algorithm for polynomial evaluation.
173+
///
174+
/// [Horner's method]: https://en.wikipedia.org/wiki/Horner%27s_method
175+
fn horner_method<A>(coefficients: Vec<A>, indeterminate: A) -> A
176+
where
177+
A: Float,
178+
{
179+
let mut result = A::zero();
180+
for coefficient in coefficients.into_iter().rev() {
181+
result = coefficient + indeterminate * result
182+
}
183+
result
39184
}
40185

41186
#[cfg(test)]
42187
mod tests {
43188
use super::SummaryStatisticsExt;
44189
use crate::errors::EmptyInput;
45-
use approx::abs_diff_eq;
46-
use ndarray::{array, Array1};
190+
use approx::assert_abs_diff_eq;
191+
use ndarray::{array, Array, Array1};
192+
use ndarray_rand::RandomExt;
47193
use noisy_float::types::N64;
194+
use rand::distributions::Uniform;
48195
use std::f64;
49196

50197
#[test]
@@ -90,16 +237,116 @@ mod tests {
90237
// Computed using SciPy
91238
let expected_geometric_mean = 0.4345897639796527;
92239

93-
abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = f64::EPSILON);
94-
abs_diff_eq!(
240+
assert_abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = 1e-9);
241+
assert_abs_diff_eq!(
95242
a.harmonic_mean().unwrap(),
96243
expected_harmonic_mean,
97-
epsilon = f64::EPSILON
244+
epsilon = 1e-7
98245
);
99-
abs_diff_eq!(
246+
assert_abs_diff_eq!(
100247
a.geometric_mean().unwrap(),
101248
expected_geometric_mean,
102-
epsilon = f64::EPSILON
249+
epsilon = 1e-12
103250
);
104251
}
252+
253+
#[test]
254+
fn test_central_moment_with_empty_array_of_floats() {
255+
let a: Array1<f64> = array![];
256+
for order in 0..=3 {
257+
assert_eq!(a.central_moment(order), Err(EmptyInput));
258+
assert_eq!(a.central_moments(order), Err(EmptyInput));
259+
}
260+
}
261+
262+
#[test]
263+
fn test_zeroth_central_moment_is_one() {
264+
let n = 50;
265+
let bound: f64 = 200.;
266+
let a = Array::random(n, Uniform::new(-bound.abs(), bound.abs()));
267+
assert_eq!(a.central_moment(0).unwrap(), 1.);
268+
}
269+
270+
#[test]
271+
fn test_first_central_moment_is_zero() {
272+
let n = 50;
273+
let bound: f64 = 200.;
274+
let a = Array::random(n, Uniform::new(-bound.abs(), bound.abs()));
275+
assert_eq!(a.central_moment(1).unwrap(), 0.);
276+
}
277+
278+
#[test]
279+
fn test_central_moments() {
280+
let a: Array1<f64> = array![
281+
0.07820559, 0.5026185, 0.80935324, 0.39384033, 0.9483038, 0.62516215, 0.90772261,
282+
0.87329831, 0.60267392, 0.2960298, 0.02810356, 0.31911966, 0.86705506, 0.96884832,
283+
0.2222465, 0.42162446, 0.99909868, 0.47619762, 0.91696979, 0.9972741, 0.09891734,
284+
0.76934818, 0.77566862, 0.7692585, 0.2235759, 0.44821286, 0.79732186, 0.04804275,
285+
0.87863238, 0.1111003, 0.6653943, 0.44386445, 0.2133176, 0.39397086, 0.4374617,
286+
0.95896624, 0.57850146, 0.29301706, 0.02329879, 0.2123203, 0.62005503, 0.996492,
287+
0.5342986, 0.97822099, 0.5028445, 0.6693834, 0.14256682, 0.52724704, 0.73482372,
288+
0.1809703,
289+
];
290+
// Computed using scipy.stats.moment
291+
let expected_moments = vec![
292+
1.,
293+
0.,
294+
0.09339920262960291,
295+
-0.0026849636727735186,
296+
0.015403769257729755,
297+
-0.001204176487006564,
298+
0.002976822584939186,
299+
];
300+
for (order, expected_moment) in expected_moments.iter().enumerate() {
301+
assert_abs_diff_eq!(
302+
a.central_moment(order as u16).unwrap(),
303+
expected_moment,
304+
epsilon = 1e-8
305+
);
306+
}
307+
}
308+
309+
#[test]
310+
fn test_bulk_central_moments() {
311+
// Test that the bulk method is coherent with the non-bulk method
312+
let n = 50;
313+
let bound: f64 = 200.;
314+
let a = Array::random(n, Uniform::new(-bound.abs(), bound.abs()));
315+
let order = 10;
316+
let central_moments = a.central_moments(order).unwrap();
317+
for i in 0..=order {
318+
assert_eq!(a.central_moment(i).unwrap(), central_moments[i as usize]);
319+
}
320+
}
321+
322+
#[test]
323+
fn test_kurtosis_and_skewness_is_none_with_empty_array_of_floats() {
324+
let a: Array1<f64> = array![];
325+
assert_eq!(a.skewness(), Err(EmptyInput));
326+
assert_eq!(a.kurtosis(), Err(EmptyInput));
327+
}
328+
329+
#[test]
330+
fn test_kurtosis_and_skewness() {
331+
let a: Array1<f64> = array![
332+
0.33310096, 0.98757449, 0.9789796, 0.96738114, 0.43545674, 0.06746873, 0.23706562,
333+
0.04241815, 0.38961714, 0.52421271, 0.93430327, 0.33911604, 0.05112372, 0.5013455,
334+
0.05291507, 0.62511183, 0.20749633, 0.22132433, 0.14734804, 0.51960608, 0.00449208,
335+
0.4093339, 0.2237519, 0.28070469, 0.7887231, 0.92224523, 0.43454188, 0.18335111,
336+
0.08646856, 0.87979847, 0.25483457, 0.99975627, 0.52712442, 0.41163279, 0.85162594,
337+
0.52618733, 0.75815023, 0.30640695, 0.14205781, 0.59695813, 0.851331, 0.39524328,
338+
0.73965373, 0.4007615, 0.02133069, 0.92899207, 0.79878191, 0.38947334, 0.22042183,
339+
0.77768353,
340+
];
341+
// Computed using scipy.stats.kurtosis(a, fisher=False)
342+
let expected_kurtosis = 1.821933711687523;
343+
// Computed using scipy.stats.skew
344+
let expected_skewness = 0.2604785422878771;
345+
346+
let kurtosis = a.kurtosis().unwrap();
347+
let skewness = a.skewness().unwrap();
348+
349+
assert_abs_diff_eq!(kurtosis, expected_kurtosis, epsilon = 1e-12);
350+
assert_abs_diff_eq!(skewness, expected_skewness, epsilon = 1e-8);
351+
}
105352
}

src/summary_statistics/mod.rs

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,85 @@ where
6161
fn geometric_mean(&self) -> Result<A, EmptyInput>
6262
where
6363
A: Float + FromPrimitive;
64+
65+
/// Returns the [kurtosis] `Kurt[X]` of all elements in the array:
66+
///
67+
/// ```text
68+
/// Kurt[X] = μ₄ / σ⁴
69+
/// ```
70+
///
71+
/// where μ₄ is the fourth central moment and σ is the standard deviation of
72+
/// the elements in the array.
73+
///
74+
/// This is sometimes referred to as _Pearson's kurtosis_. Fisher's kurtosis can be
75+
/// computed by subtracting 3 from Pearson's kurtosis.
76+
///
77+
/// If the array is empty, `Err(EmptyInput)` is returned.
78+
///
79+
/// **Panics** if `A::from_usize()` fails to convert the number of elements in the array.
80+
///
81+
/// [kurtosis]: https://en.wikipedia.org/wiki/Kurtosis
82+
fn kurtosis(&self) -> Result<A, EmptyInput>
83+
where
84+
A: Float + FromPrimitive;
85+
86+
/// Returns the [Pearson's moment coefficient of skewness] γ₁ of all elements in the array:
87+
///
88+
/// ```text
89+
/// γ₁ = μ₃ / σ³
90+
/// ```
91+
///
92+
/// where μ₃ is the third central moment and σ is the standard deviation of
93+
/// the elements in the array.
94+
///
95+
/// If the array is empty, `Err(EmptyInput)` is returned.
96+
///
97+
/// **Panics** if `A::from_usize()` fails to convert the number of elements in the array.
98+
///
99+
/// [Pearson's moment coefficient of skewness]: https://en.wikipedia.org/wiki/Skewness
100+
fn skewness(&self) -> Result<A, EmptyInput>
101+
where
102+
A: Float + FromPrimitive;
103+
104+
/// Returns the *p*-th [central moment] of all elements in the array, μₚ:
105+
///
106+
/// ```text
107+
/// 1 n
108+
/// μₚ = ― ∑ (xᵢ-x̅)ᵖ
109+
/// n i=1
110+
/// ```
111+
///
112+
/// If the array is empty, `Err(EmptyInput)` is returned.
113+
///
114+
/// The *p*-th central moment is computed using a corrected two-pass algorithm (see Section 3.5
115+
/// in [Pébay et al., 2016]). Complexity is *O(np)* when *n >> p*, *p > 1*.
116+
///
117+
/// **Panics** if `A::from_usize()` fails to convert the number of elements
118+
/// in the array or if `order` overflows `i32`.
119+
///
120+
/// [central moment]: https://en.wikipedia.org/wiki/Central_moment
121+
/// [Pébay et al., 2016]: https://www.osti.gov/pages/servlets/purl/1427275
122+
fn central_moment(&self, order: u16) -> Result<A, EmptyInput>
123+
where
124+
A: Float + FromPrimitive;
125+
126+
/// Returns the first *p* [central moments] of all elements in the array, see [central moment]
127+
/// for more details.
128+
///
129+
/// If the array is empty, `Err(EmptyInput)` is returned.
130+
///
131+
/// This method reuses the intermediate steps for the *k*-th moment to compute the *(k+1)*-th,
132+
/// being thus more efficient than repeated calls to [central moment] if the computation
133+
/// of central moments of multiple orders is required.
134+
///
135+
/// **Panics** if `A::from_usize()` fails to convert the number of elements
136+
/// in the array or if `order` overflows `i32`.
137+
///
138+
/// [central moments]: https://en.wikipedia.org/wiki/Central_moment
139+
/// [central moment]: #tymethod.central_moment
140+
fn central_moments(&self, order: u16) -> Result<Vec<A>, EmptyInput>
141+
where
142+
A: Float + FromPrimitive;
64143
}
65144

66145
mod means;

0 commit comments

Comments
 (0)