Skip to content

Commit fab7695

Browse files
committed
fix CI
1 parent b307ad0 commit fab7695

File tree

2 files changed

+520
-191
lines changed

2 files changed

+520
-191
lines changed

examples/case_studies/LKJ.ipynb

Lines changed: 382 additions & 159 deletions
Large diffs are not rendered by default.

myst_nbs/case_studies/LKJ.myst.md

Lines changed: 138 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -11,54 +11,78 @@ kernelspec:
1111
name: python3
1212
---
1313

14+
+++ {"id": "XShKDkNir2PX"}
15+
1416
# LKJ Cholesky Covariance Priors for Multivariate Normal Models
1517

16-
+++
18+
+++ {"id": "QxSKBbjKr2PZ"}
1719

1820
While the [inverse-Wishart distribution](https://en.wikipedia.org/wiki/Inverse-Wishart_distribution) is the conjugate prior for the covariance matrix of a multivariate normal distribution, it is [not very well-suited](https://github.com/pymc-devs/pymc3/issues/538#issuecomment-94153586) to modern Bayesian computational methods. For this reason, the [LKJ prior](http://www.sciencedirect.com/science/article/pii/S0047259X09000876) is recommended when modeling the covariance matrix of a multivariate normal distribution.
1921

2022
To illustrate modelling covariance with the LKJ distribution, we first generate a two-dimensional normally-distributed sample data set.
2123

2224
```{code-cell} ipython3
25+
---
26+
colab:
27+
base_uri: https://localhost:8080/
28+
id: 17Thh2DHr2Pa
29+
outputId: 90631275-86c9-4f4a-f81a-22465d0c8b8c
30+
---
31+
%env MKL_THREADING_LAYER=GNU
2332
import arviz as az
2433
import numpy as np
25-
import pymc3 as pm
34+
import pymc as pm
2635
import seaborn as sns
2736
2837
from matplotlib import pyplot as plt
2938
from matplotlib.lines import Line2D
3039
from matplotlib.patches import Ellipse
3140
32-
print(f"Running on PyMC3 v{pm.__version__}")
41+
print(f"Running on PyMC v{pm.__version__}")
3342
```
3443

3544
```{code-cell} ipython3
45+
:id: Sq6K4Ie4r2Pc
46+
3647
%config InlineBackend.figure_format = 'retina'
3748
RANDOM_SEED = 8927
3849
rng = np.random.default_rng(RANDOM_SEED)
3950
az.style.use("arviz-darkgrid")
4051
```
4152

4253
```{code-cell} ipython3
54+
---
55+
colab:
56+
base_uri: https://localhost:8080/
57+
id: eA491vJMr2Pc
58+
outputId: 30ea38db-0767-4e89-eb09-68927878018e
59+
---
4360
N = 10000
4461
45-
μ_actual = np.array([1.0, -2.0])
62+
mu_actual = np.array([1.0, -2.0])
4663
sigmas_actual = np.array([0.7, 1.5])
4764
Rho_actual = np.matrix([[1.0, -0.4], [-0.4, 1.0]])
4865
49-
Σ_actual = np.diag(sigmas_actual) * Rho_actual * np.diag(sigmas_actual)
66+
Sigma_actual = np.diag(sigmas_actual) * Rho_actual * np.diag(sigmas_actual)
5067
51-
x = rng.multivariate_normal(μ_actual, Σ_actual, size=N)
52-
Σ_actual
68+
x = rng.multivariate_normal(mu_actual, Sigma_actual, size=N)
69+
Sigma_actual
5370
```
5471

5572
```{code-cell} ipython3
56-
var, U = np.linalg.eig(Σ_actual)
73+
---
74+
colab:
75+
base_uri: https://localhost:8080/
76+
height: 628
77+
id: ZmFDGQ8Jr2Pd
78+
outputId: 03ba3248-370c-4ff9-8626-ba601423b9c1
79+
---
80+
var, U = np.linalg.eig(Sigma_actual)
5781
angle = 180.0 / np.pi * np.arccos(np.abs(U[0, 0]))
5882
5983
fig, ax = plt.subplots(figsize=(8, 6))
6084
61-
e = Ellipse(μ_actual, 2 * np.sqrt(5.991 * var[0]), 2 * np.sqrt(5.991 * var[1]), angle=angle)
85+
e = Ellipse(mu_actual, 2 * np.sqrt(5.991 * var[0]), 2 * np.sqrt(5.991 * var[1]), angle=angle)
6286
e.set_alpha(0.5)
6387
e.set_facecolor("C0")
6488
e.set_zorder(10)
@@ -72,73 +96,119 @@ rect = plt.Rectangle((0, 0), 1, 1, fc="C0", alpha=0.5)
7296
ax.legend([rect], ["95% density region"], loc=2);
7397
```
7498

99+
+++ {"id": "d6320GCir2Pd"}
100+
75101
The sampling distribution for the multivariate normal model is $\mathbf{x} \sim N(\mu, \Sigma)$, where $\Sigma$ is the covariance matrix of the sampling distribution, with $\Sigma_{ij} = \textrm{Cov}(x_i, x_j)$. The density of this distribution is
76102

77103
$$f(\mathbf{x}\ |\ \mu, \Sigma^{-1}) = (2 \pi)^{-\frac{k}{2}} |\Sigma|^{-\frac{1}{2}} \exp\left(-\frac{1}{2} (\mathbf{x} - \mu)^{\top} \Sigma^{-1} (\mathbf{x} - \mu)\right).$$
78104

79105
The LKJ distribution provides a prior on the correlation matrix, $\mathbf{C} = \textrm{Corr}(x_i, x_j)$, which, combined with priors on the standard deviations of each component, [induces](http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf) a prior on the covariance matrix, $\Sigma$. Since inverting $\Sigma$ is numerically unstable and inefficient, it is computationally advantageous to use the [Cholesky decompositon](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\Sigma$, $\Sigma = \mathbf{L} \mathbf{L}^{\top}$, where $\mathbf{L}$ is a lower-triangular matrix. This decompositon allows computation of the term $(\mathbf{x} - \mu)^{\top} \Sigma^{-1} (\mathbf{x} - \mu)$ using back-substitution, which is more numerically stable and efficient than direct matrix inversion.
80106

81-
PyMC3 supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](../api/distributions/multivariate.rst) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC3 distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
107+
PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](https://docs.pymc.io/en/latest/api/distributions/generated/pymc.LKJCholeskyCov.html) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
82108

83109
In this example, we model the standard deviations with $\textrm{Exponential}(1.0)$ priors, and the correlation matrix as $\mathbf{C} \sim \textrm{LKJ}(\eta = 2)$.
84110

85111
```{code-cell} ipython3
112+
:id: 7GcM6oENr2Pe
113+
86114
with pm.Model() as m:
87-
packed_L = pm.LKJCholeskyCov("packed_L", n=2, eta=2.0, sd_dist=pm.Exponential.dist(1.0))
115+
packed_L = pm.LKJCholeskyCov(
116+
"packed_L", n=2, eta=2.0, sd_dist=pm.Exponential.dist(1.0, shape=2), compute_corr=False
117+
)
88118
```
89119

120+
+++ {"id": "6Cscu-CRr2Pe"}
121+
90122
Since the Cholesky decompositon of $\Sigma$ is lower triangular, `LKJCholeskyCov` only stores the diagonal and sub-diagonal entries, for efficiency:
91123

92124
```{code-cell} ipython3
93-
packed_L.tag.test_value.shape
125+
---
126+
colab:
127+
base_uri: https://localhost:8080/
128+
id: JMWeTjDjr2Pe
129+
outputId: e4f767a3-c1d7-4016-a3cf-91089c925bdb
130+
---
131+
packed_L.eval()
94132
```
95133

134+
+++ {"id": "59FtijDir2Pe"}
135+
96136
We use [expand_packed_triangular](../api/math.rst) to transform this vector into the lower triangular matrix $\mathbf{L}$, which appears in the Cholesky decomposition $\Sigma = \mathbf{L} \mathbf{L}^{\top}$.
97137

98138
```{code-cell} ipython3
139+
---
140+
colab:
141+
base_uri: https://localhost:8080/
142+
id: YxBbFyUxr2Pf
143+
outputId: bd37c630-98dd-437b-bb13-89281aeccc44
144+
---
99145
with m:
100146
L = pm.expand_packed_triangular(2, packed_L)
101-
Σ = L.dot(L.T)
147+
Sigma = L.dot(L.T)
102148
103-
L.tag.test_value.shape
149+
L.eval().shape
104150
```
105151

106-
Often however, you'll be interested in the posterior distribution of the correlations matrix and of the standard deviations, not in the posterior Cholesky covariance matrix *per se*. Why? Because the correlations and standard deviations are easier to interpret and often have a scientific meaning in the model. As of PyMC 3.9, there is a way to tell PyMC to automatically do these computations and store the posteriors in the trace. You just have to specify `compute_corr=True` in `pm.LKJCholeskyCov`:
152+
+++ {"id": "SwdNd_0Jr2Pf"}
153+
154+
Often however, you'll be interested in the posterior distribution of the correlations matrix and of the standard deviations, not in the posterior Cholesky covariance matrix *per se*. Why? Because the correlations and standard deviations are easier to interpret and often have a scientific meaning in the model. As of PyMC v4, the `compute_corr` argument is set to `True` by default, which returns a tuple consisting of the Cholesky decomposition, the correlations matrix, and the standard deviations.
107155

108156
```{code-cell} ipython3
157+
:id: ac3eQeMJr2Pf
158+
109159
coords = {"axis": ["y", "z"], "axis_bis": ["y", "z"], "obs_id": np.arange(N)}
110-
with pm.Model(coords=coords) as model:
160+
with pm.Model(coords=coords, rng_seeder=RANDOM_SEED) as model:
111161
chol, corr, stds = pm.LKJCholeskyCov(
112-
"chol", n=2, eta=2.0, sd_dist=pm.Exponential.dist(1.0), compute_corr=True
162+
"chol", n=2, eta=2.0, sd_dist=pm.Exponential.dist(1.0, shape=2)
113163
)
114164
cov = pm.Deterministic("cov", chol.dot(chol.T), dims=("axis", "axis_bis"))
115165
```
116166

167+
+++ {"id": "cpEupNzWr2Pg"}
168+
117169
To complete our model, we place independent, weakly regularizing priors, $N(0, 1.5),$ on $\mu$:
118170

119171
```{code-cell} ipython3
172+
:id: iTI4uiBdr2Pg
173+
120174
with model:
121-
μ = pm.Normal("μ", 0.0, 1.5, testval=x.mean(axis=0), dims="axis")
122-
obs = pm.MvNormal("obs", μ, chol=chol, observed=x, dims=("obs_id", "axis"))
175+
mu = pm.Normal("mu", 0.0, sigma=1.5, dims="axis")
176+
obs = pm.MvNormal("obs", mu, chol=chol, observed=x, dims=("obs_id", "axis"))
123177
```
124178

125-
We sample from this model using NUTS and give the trace to [ArviZ](https://arviz-devs.github.io/arviz/):
179+
+++ {"id": "QOCi1RKvr2Ph"}
180+
181+
We sample from this model using NUTS and give the trace to [ArviZ](https://arviz-devs.github.io/arviz/) for summarization:
126182

127183
```{code-cell} ipython3
184+
---
185+
colab:
186+
base_uri: https://localhost:8080/
187+
height: 608
188+
id: vBPIQDWrr2Ph
189+
outputId: f039bfb8-1acf-42cb-b054-bc2c97697f96
190+
---
128191
with model:
129192
trace = pm.sample(
130-
random_seed=RANDOM_SEED,
131-
init="adapt_diag",
132193
idata_kwargs={"dims": {"chol_stds": ["axis"], "chol_corr": ["axis", "axis_bis"]}},
133194
)
134195
az.summary(trace, var_names="~chol", round_to=2)
135196
```
136197

198+
+++ {"id": "X8ucBpcRr2Ph"}
199+
137200
Sampling went smoothly: no divergences and good r-hats (except for the diagonal elements of the correlation matrix - however, these are not a concern, because, they should be equal to 1 for each sample for each chain and the variance of a constant value isn't defined. If one of the diagonal elements has `r_hat` defined, it's likely due to tiny numerical errors).
138201

139202
You can also see that the sampler recovered the true means, correlations and standard deviations. As often, that will be clearer in a graph:
140203

141204
```{code-cell} ipython3
205+
---
206+
colab:
207+
base_uri: https://localhost:8080/
208+
height: 228
209+
id: dgOKiSLdr2Pi
210+
outputId: a29bde4b-c4dc-49f4-e65d-c3365c1933e1
211+
---
142212
az.plot_trace(
143213
trace,
144214
var_names="chol_corr",
@@ -148,42 +218,72 @@ az.plot_trace(
148218
```
149219

150220
```{code-cell} ipython3
221+
---
222+
colab:
223+
base_uri: https://localhost:8080/
224+
height: 628
225+
id: dtBWyd5Jr2Pi
226+
outputId: 94ee6945-a564-487a-e447-3c447057f0bf
227+
---
151228
az.plot_trace(
152229
trace,
153230
var_names=["~chol", "~chol_corr"],
154231
compact=True,
155232
lines=[
156-
("μ", {}, μ_actual),
157-
("cov", {}, Σ_actual),
233+
("mu", {}, mu_actual),
234+
("cov", {}, Sigma_actual),
158235
("chol_stds", {}, sigmas_actual),
159236
],
160237
);
161238
```
162239

240+
+++ {"id": "NnLWJyCMr2Pi"}
241+
163242
The posterior expected values are very close to the true value of each component! How close exactly? Let's compute the percentage of closeness for $\mu$ and $\Sigma$:
164243

165244
```{code-cell} ipython3
166-
μ_post = trace.posterior["μ"].mean(("chain", "draw")).values
167-
(1 - μ_post / μ_actual).round(2)
245+
---
246+
colab:
247+
base_uri: https://localhost:8080/
248+
id: yDlyVSizr2Pj
249+
outputId: 69c22c57-db27-4f43-ab94-7b88480a21f9
250+
---
251+
mu_post = trace.posterior["mu"].mean(("chain", "draw")).values
252+
(1 - mu_post / mu_actual).round(2)
168253
```
169254

170255
```{code-cell} ipython3
171-
Σ_post = trace.posterior["cov"].mean(("chain", "draw")).values
172-
(1 - Σ_post / Σ_actual).round(2)
256+
---
257+
colab:
258+
base_uri: https://localhost:8080/
259+
id: rFF947Grr2Pj
260+
outputId: 398332a0-a142-4ad0-dadf-bde13ef2b00b
261+
---
262+
Sigma_post = trace.posterior["cov"].mean(("chain", "draw")).values
263+
(1 - Sigma_post / Sigma_actual).round(2)
173264
```
174265

266+
+++ {"id": "DMDwKtp0r2Pj"}
267+
175268
So the posterior means are within 3% of the true values of $\mu$ and $\Sigma$.
176269

177270
Now let's replicate the plot we did at the beginning, but let's overlay the posterior distribution on top of the true distribution -- you'll see there is excellent visual agreement between both:
178271

179272
```{code-cell} ipython3
180-
var_post, U_post = np.linalg.eig(Σ_post)
273+
---
274+
colab:
275+
base_uri: https://localhost:8080/
276+
height: 628
277+
id: _dwHYuj1r2Pj
278+
outputId: 9b53b875-af25-4f79-876f-a02e72bba5a9
279+
---
280+
var_post, U_post = np.linalg.eig(Sigma_post)
181281
angle_post = 180.0 / np.pi * np.arccos(np.abs(U_post[0, 0]))
182282
183283
fig, ax = plt.subplots(figsize=(8, 6))
184284
185285
e = Ellipse(
186-
μ_actual,
286+
mu_actual,
187287
2 * np.sqrt(5.991 * var[0]),
188288
2 * np.sqrt(5.991 * var[1]),
189289
angle=angle,
@@ -196,7 +296,7 @@ e.set_fill(False)
196296
ax.add_artist(e)
197297
198298
e_post = Ellipse(
199-
μ_post,
299+
mu_post,
200300
2 * np.sqrt(5.991 * var_post[0]),
201301
2 * np.sqrt(5.991 * var_post[1]),
202302
angle=angle_post,
@@ -220,6 +320,12 @@ ax.legend(
220320
```
221321

222322
```{code-cell} ipython3
323+
---
324+
colab:
325+
base_uri: https://localhost:8080/
326+
id: kJCfuzGtr2Pq
327+
outputId: da547b05-d812-4959-aff6-cf4a12faca15
328+
---
223329
%load_ext watermark
224-
%watermark -n -u -v -iv -w -p theano,xarray
330+
%watermark -n -u -v -iv -w -p aesara,xarray
225331
```

0 commit comments

Comments
 (0)