You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# LKJ Cholesky Covariance Priors for Multivariate Normal Models
15
17
16
-
+++
18
+
+++ {"id": "QxSKBbjKr2PZ"}
17
19
18
20
While the [inverse-Wishart distribution](https://en.wikipedia.org/wiki/Inverse-Wishart_distribution) is the conjugate prior for the covariance matrix of a multivariate normal distribution, it is [not very well-suited](https://github.com/pymc-devs/pymc3/issues/538#issuecomment-94153586) to modern Bayesian computational methods. For this reason, the [LKJ prior](http://www.sciencedirect.com/science/article/pii/S0047259X09000876) is recommended when modeling the covariance matrix of a multivariate normal distribution.
19
21
20
22
To illustrate modelling covariance with the LKJ distribution, we first generate a two-dimensional normally-distributed sample data set.
The sampling distribution for the multivariate normal model is $\mathbf{x} \sim N(\mu, \Sigma)$, where $\Sigma$ is the covariance matrix of the sampling distribution, with $\Sigma_{ij} = \textrm{Cov}(x_i, x_j)$. The density of this distribution is
The LKJ distribution provides a prior on the correlation matrix, $\mathbf{C} = \textrm{Corr}(x_i, x_j)$, which, combined with priors on the standard deviations of each component, [induces](http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf) a prior on the covariance matrix, $\Sigma$. Since inverting $\Sigma$ is numerically unstable and inefficient, it is computationally advantageous to use the [Cholesky decompositon](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\Sigma$, $\Sigma = \mathbf{L} \mathbf{L}^{\top}$, where $\mathbf{L}$ is a lower-triangular matrix. This decompositon allows computation of the term $(\mathbf{x} - \mu)^{\top} \Sigma^{-1} (\mathbf{x} - \mu)$ using back-substitution, which is more numerically stable and efficient than direct matrix inversion.
80
106
81
-
PyMC3 supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](../api/distributions/multivariate.rst) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC3 distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
107
+
PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](https://docs.pymc.io/en/latest/api/distributions/generated/pymc.LKJCholeskyCov.html) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
82
108
83
109
In this example, we model the standard deviations with $\textrm{Exponential}(1.0)$ priors, and the correlation matrix as $\mathbf{C} \sim \textrm{LKJ}(\eta = 2)$.
Since the Cholesky decompositon of $\Sigma$ is lower triangular, `LKJCholeskyCov` only stores the diagonal and sub-diagonal entries, for efficiency:
91
123
92
124
```{code-cell} ipython3
93
-
packed_L.tag.test_value.shape
125
+
---
126
+
colab:
127
+
base_uri: https://localhost:8080/
128
+
id: JMWeTjDjr2Pe
129
+
outputId: e4f767a3-c1d7-4016-a3cf-91089c925bdb
130
+
---
131
+
packed_L.eval()
94
132
```
95
133
134
+
+++ {"id": "59FtijDir2Pe"}
135
+
96
136
We use [expand_packed_triangular](../api/math.rst) to transform this vector into the lower triangular matrix $\mathbf{L}$, which appears in the Cholesky decomposition $\Sigma = \mathbf{L} \mathbf{L}^{\top}$.
97
137
98
138
```{code-cell} ipython3
139
+
---
140
+
colab:
141
+
base_uri: https://localhost:8080/
142
+
id: YxBbFyUxr2Pf
143
+
outputId: bd37c630-98dd-437b-bb13-89281aeccc44
144
+
---
99
145
with m:
100
146
L = pm.expand_packed_triangular(2, packed_L)
101
-
Σ = L.dot(L.T)
147
+
Sigma = L.dot(L.T)
102
148
103
-
L.tag.test_value.shape
149
+
L.eval().shape
104
150
```
105
151
106
-
Often however, you'll be interested in the posterior distribution of the correlations matrix and of the standard deviations, not in the posterior Cholesky covariance matrix *per se*. Why? Because the correlations and standard deviations are easier to interpret and often have a scientific meaning in the model. As of PyMC 3.9, there is a way to tell PyMC to automatically do these computations and store the posteriors in the trace. You just have to specify `compute_corr=True` in `pm.LKJCholeskyCov`:
152
+
+++ {"id": "SwdNd_0Jr2Pf"}
153
+
154
+
Often however, you'll be interested in the posterior distribution of the correlations matrix and of the standard deviations, not in the posterior Cholesky covariance matrix *per se*. Why? Because the correlations and standard deviations are easier to interpret and often have a scientific meaning in the model. As of PyMC v4, the `compute_corr` argument is set to `True` by default, which returns a tuple consisting of the Cholesky decomposition, the correlations matrix, and the standard deviations.
Sampling went smoothly: no divergences and good r-hats (except for the diagonal elements of the correlation matrix - however, these are not a concern, because, they should be equal to 1 for each sample for each chain and the variance of a constant value isn't defined. If one of the diagonal elements has `r_hat` defined, it's likely due to tiny numerical errors).
138
201
139
202
You can also see that the sampler recovered the true means, correlations and standard deviations. As often, that will be clearer in a graph:
140
203
141
204
```{code-cell} ipython3
205
+
---
206
+
colab:
207
+
base_uri: https://localhost:8080/
208
+
height: 228
209
+
id: dgOKiSLdr2Pi
210
+
outputId: a29bde4b-c4dc-49f4-e65d-c3365c1933e1
211
+
---
142
212
az.plot_trace(
143
213
trace,
144
214
var_names="chol_corr",
@@ -148,42 +218,72 @@ az.plot_trace(
148
218
```
149
219
150
220
```{code-cell} ipython3
221
+
---
222
+
colab:
223
+
base_uri: https://localhost:8080/
224
+
height: 628
225
+
id: dtBWyd5Jr2Pi
226
+
outputId: 94ee6945-a564-487a-e447-3c447057f0bf
227
+
---
151
228
az.plot_trace(
152
229
trace,
153
230
var_names=["~chol", "~chol_corr"],
154
231
compact=True,
155
232
lines=[
156
-
("μ", {}, μ_actual),
157
-
("cov", {}, Σ_actual),
233
+
("mu", {}, mu_actual),
234
+
("cov", {}, Sigma_actual),
158
235
("chol_stds", {}, sigmas_actual),
159
236
],
160
237
);
161
238
```
162
239
240
+
+++ {"id": "NnLWJyCMr2Pi"}
241
+
163
242
The posterior expected values are very close to the true value of each component! How close exactly? Let's compute the percentage of closeness for $\mu$ and $\Sigma$:
So the posterior means are within 3% of the true values of $\mu$ and $\Sigma$.
176
269
177
270
Now let's replicate the plot we did at the beginning, but let's overlay the posterior distribution on top of the true distribution -- you'll see there is excellent visual agreement between both:
0 commit comments