Skip to content

Commit 9612052

Browse files
update notebook: Update Priors (#692)
* update notebook * add warning comments * add author * Update examples/howto/updating_priors.myst.md Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com> * Update examples/howto/updating_priors.myst.md Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com> * Update examples/howto/updating_priors.myst.md Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com> * Update examples/howto/updating_priors.myst.md Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com> * feedback --------- Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com>
1 parent 7c9d1d2 commit 9612052

File tree

2 files changed

+323
-579
lines changed

2 files changed

+323
-579
lines changed

examples/howto/updating_priors.ipynb

Lines changed: 215 additions & 524 deletions
Large diffs are not rendered by default.

examples/howto/updating_priors.myst.md

Lines changed: 108 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -10,56 +10,62 @@ kernelspec:
1010
name: python3
1111
---
1212

13-
# Updating priors
13+
(updating_priors)=
14+
# Updating Priors
15+
16+
:::{post} January, 2017
17+
:tags: priors
18+
:category: intermediate, how-to
19+
:author: [David Brochart](https://github.com/davidbrochart)
20+
:::
1421

1522
+++
1623

17-
In this notebook, I will show how it is possible to update the priors as new data becomes available. The example is a slightly modified version of the linear regression in the [Getting started with PyMC3](https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/getting_started.ipynb) notebook.
24+
In this notebook, we will show how, in principle, it is possible to update the priors as new data becomes available.
1825

19-
```{code-cell} ipython3
20-
%matplotlib inline
21-
import warnings
26+
`````{admonition} Words of Caution
27+
:class: warning
28+
This example provides a very nice usage example for the {class}`~pymc.Interpolated` class, as we will see below. However, this might not be a good idea to do in practice, not only because KDEs are being used to compute pdf values for the posterior, but mostly because Interpolated distributions used as priors are **unidimensional** and **uncorrelated**. So even if a perfect fit *marginally* they don't really incorporate all the information we have from the previous posterior into the model, especially when posterior variables are correlated. See a nice discussion about the subject in the blog post [Some dimensionality devils](https://oriolabrilpla.cat/en/blog/posts/2022/too-eager-reduction.html#univariate-priors) by [Oriol Abril](https://oriolabrilpla.cat/en/).
29+
``````
2230
31+
```{code-cell} ipython3
2332
import arviz as az
2433
import matplotlib as mpl
2534
import matplotlib.pyplot as plt
2635
import numpy as np
27-
import pymc3 as pm
28-
import theano.tensor as tt
36+
import pymc as pm
37+
import pytensor.tensor as pt
2938
30-
from pymc3 import Model, Normal, Slice, sample
31-
from pymc3.distributions import Interpolated
3239
from scipy import stats
33-
from theano import as_op
40+
from tqdm.notebook import trange
3441
35-
plt.style.use("seaborn-darkgrid")
36-
print(f"Running on PyMC3 v{pm.__version__}")
42+
az.style.use("arviz-white")
43+
44+
%config InlineBackend.figure_format = "retina"
3745
```
3846
3947
```{code-cell} ipython3
40-
warnings.filterwarnings("ignore")
48+
rng: np.random.Generator = np.random.default_rng(seed=42)
4149
```
4250
4351
## Generating data
4452
4553
```{code-cell} ipython3
46-
# Initialize random number generator
47-
np.random.seed(93457)
48-
4954
# True parameter values
5055
alpha_true = 5
5156
beta0_true = 7
5257
beta1_true = 13
58+
sigma_true = 2
5359
5460
# Size of dataset
5561
size = 100
5662
5763
# Predictor variable
58-
X1 = np.random.randn(size)
59-
X2 = np.random.randn(size) * 0.2
64+
X1 = rng.normal(size=size)
65+
X2 = rng.normal(size=size) * 0.2
6066
6167
# Simulate outcome variable
62-
Y = alpha_true + beta0_true * X1 + beta1_true * X2 + np.random.randn(size)
68+
Y = alpha_true + beta0_true * X1 + beta1_true * X2 + rng.normal(size=size, scale=sigma_true)
6369
```
6470
6571
## Model specification
@@ -69,35 +75,48 @@ Y = alpha_true + beta0_true * X1 + beta1_true * X2 + np.random.randn(size)
6975
Our initial beliefs about the parameters are quite informative (sigma=1) and a bit off the true values.
7076
7177
```{code-cell} ipython3
72-
basic_model = Model()
73-
74-
with basic_model:
78+
with pm.Model() as model:
7579
# Priors for unknown model parameters
76-
alpha = Normal("alpha", mu=0, sigma=1)
77-
beta0 = Normal("beta0", mu=12, sigma=1)
78-
beta1 = Normal("beta1", mu=18, sigma=1)
80+
alpha = pm.Normal("alpha", mu=0, sigma=5)
81+
beta0 = pm.Normal("beta0", mu=0, sigma=5)
82+
beta1 = pm.Normal("beta1", mu=0, sigma=5)
83+
84+
sigma = pm.HalfNormal("sigma", sigma=1)
7985
8086
# Expected value of outcome
8187
mu = alpha + beta0 * X1 + beta1 * X2
8288
8389
# Likelihood (sampling distribution) of observations
84-
Y_obs = Normal("Y_obs", mu=mu, sigma=1, observed=Y)
90+
Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=Y)
8591
86-
# draw 1000 posterior samples
87-
trace = sample(1000)
92+
# draw 2_000 posterior samples
93+
trace = pm.sample(
94+
tune=1_500, draws=2_000, target_accept=0.9, progressbar=False, random_seed=rng
95+
)
8896
```
8997
9098
```{code-cell} ipython3
91-
az.plot_trace(trace);
99+
axes = az.plot_trace(
100+
data=trace,
101+
compact=True,
102+
lines=[
103+
("alpha", {}, alpha_true),
104+
("beta0", {}, beta0_true),
105+
("beta1", {}, beta1_true),
106+
("sigma", {}, sigma_true),
107+
],
108+
backend_kwargs={"figsize": (12, 9), "layout": "constrained"},
109+
)
110+
plt.gcf().suptitle("Trace", fontsize=16);
92111
```
93112
94113
In order to update our beliefs about the parameters, we use the posterior distributions, which will be used as the prior distributions for the next inference. The data used for each inference iteration has to be independent from the previous iterations, otherwise the same (possibly wrong) belief is injected over and over in the system, amplifying the errors and misleading the inference. By ensuring the data is independent, the system should converge to the true parameter values.
95114
96-
Because we draw samples from the posterior distribution (shown on the right in the figure above), we need to estimate their probability density (shown on the left in the figure above). [Kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) (KDE) is a way to achieve this, and we will use this technique here. In any case, it is an empirical distribution that cannot be expressed analytically. Fortunately PyMC3 provides a way to use custom distributions, via `Interpolated` class.
115+
Because we draw samples from the posterior distribution (shown on the right in the figure above), we need to estimate their probability density (shown on the left in the figure above). [Kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) (KDE) is a way to achieve this, and we will use this technique here. In any case, it is an empirical distribution that cannot be expressed analytically. Fortunately PyMC provides a way to use custom distributions, via {class}`~pymc.Interpolated` class.
97116
98117
```{code-cell} ipython3
99118
def from_posterior(param, samples):
100-
smin, smax = np.min(samples), np.max(samples)
119+
smin, smax = samples.min().item(), samples.max().item()
101120
width = smax - smin
102121
x = np.linspace(smin, smax, 100)
103122
y = stats.gaussian_kde(samples)(x)
@@ -106,7 +125,7 @@ def from_posterior(param, samples):
106125
# so we'll extend the domain and use linear approximation of density on it
107126
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
108127
y = np.concatenate([[0], y, [0]])
109-
return Interpolated(param, x, y)
128+
return pm.Interpolated(param, x, y)
110129
```
111130
112131
Now we just need to generate more data and build our Bayesian model so that the prior distributions for the current iteration are the posterior distributions from the previous iteration. It is still possible to continue using NUTS sampling method because `Interpolated` class implements calculation of gradients that are necessary for Hamiltonian Monte Carlo samplers.
@@ -116,53 +135,87 @@ traces = [trace]
116135
```
117136
118137
```{code-cell} ipython3
119-
for _ in range(10):
138+
n_iterations = 10
139+
140+
for _ in trange(n_iterations):
120141
# generate more data
121-
X1 = np.random.randn(size)
122-
X2 = np.random.randn(size) * 0.2
123-
Y = alpha_true + beta0_true * X1 + beta1_true * X2 + np.random.randn(size)
142+
X1 = rng.normal(size=size)
143+
X2 = rng.normal(size=size) * 0.2
144+
Y = alpha_true + beta0_true * X1 + beta1_true * X2 + rng.normal(size=size, scale=sigma_true)
124145
125-
model = Model()
126-
with model:
146+
with pm.Model() as model:
127147
# Priors are posteriors from previous iteration
128-
alpha = from_posterior("alpha", trace["alpha"])
129-
beta0 = from_posterior("beta0", trace["beta0"])
130-
beta1 = from_posterior("beta1", trace["beta1"])
148+
alpha = from_posterior("alpha", az.extract(trace, group="posterior", var_names=["alpha"]))
149+
beta0 = from_posterior("beta0", az.extract(trace, group="posterior", var_names=["beta0"]))
150+
beta1 = from_posterior("beta1", az.extract(trace, group="posterior", var_names=["beta1"]))
151+
sigma = from_posterior("sigma", az.extract(trace, group="posterior", var_names=["sigma"]))
131152
132153
# Expected value of outcome
133154
mu = alpha + beta0 * X1 + beta1 * X2
134155
135156
# Likelihood (sampling distribution) of observations
136-
Y_obs = Normal("Y_obs", mu=mu, sigma=1, observed=Y)
157+
Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=Y)
137158
138-
# draw 10000 posterior samples
139-
trace = sample(1000)
159+
# draw 2_000 posterior samples
160+
trace = pm.sample(
161+
tune=1_500, draws=2_000, target_accept=0.9, progressbar=False, random_seed=rng
162+
)
140163
traces.append(trace)
141164
```
142165
143166
```{code-cell} ipython3
144-
print("Posterior distributions after " + str(len(traces)) + " iterations.")
145-
cmap = mpl.cm.autumn
146-
for param in ["alpha", "beta0", "beta1"]:
147-
plt.figure(figsize=(8, 2))
167+
fig, ax = plt.subplots(nrows=4, ncols=1, figsize=(12, 12), sharex=False, sharey=False)
168+
169+
cmap = mpl.cm.viridis
170+
171+
for i, (param, true_value) in enumerate(
172+
zip(["alpha", "beta0", "beta1", "sigma"], [alpha_true, beta0_true, beta1_true, sigma_true])
173+
):
148174
for update_i, trace in enumerate(traces):
149-
samples = trace[param]
175+
samples = az.extract(trace, group="posterior", var_names=param)
150176
smin, smax = np.min(samples), np.max(samples)
151177
x = np.linspace(smin, smax, 100)
152178
y = stats.gaussian_kde(samples)(x)
153-
plt.plot(x, y, color=cmap(1 - update_i / len(traces)))
154-
plt.axvline({"alpha": alpha_true, "beta0": beta0_true, "beta1": beta1_true}[param], c="k")
155-
plt.ylabel("Frequency")
156-
plt.title(param)
157-
158-
plt.tight_layout();
179+
ax[i].plot(x, y, color=cmap(1 - update_i / len(traces)))
180+
ax[i].axvline(true_value, c="k")
181+
ax[i].set(title=param)
159182
```
160183
161184
You can re-execute the last two cells to generate more updates.
162185
163186
What is interesting to note is that the posterior distributions for our parameters tend to get centered on their true value (vertical lines), and the distribution gets thiner and thiner. This means that we get more confident each time, and the (false) belief we had at the beginning gets flushed away by the new data we incorporate.
164187
188+
+++
189+
190+
``````{admonition} Not a silver bullet
191+
:class: warning
192+
Observe that, despite the fact that the iterations seems improving, some of them don't look so good, even sometimes it seems it regresses. In addition to reasons noted at the beginning of the notebook, there are a couple key steps in the process where randomness is involved. Thus, things should be expected to improve on average.
193+
194+
1. New observations are random. If in the initial iterations we get values closer to the bulk of the distribuion and then we get several values in a row from the positive tail, the iterations where we have accumulated a couple draws from the tail will probably be biased and "look worse" than previous ones.
195+
2. MCMC is random. Even when it converges, MCMC is a random process, so different calls to `pymc.sample` will return values centered around the exact posterior but not always the same; how large a variation we should expect can be checked with {func}`arviz.mcse`. KDEs also incorporate this often negligible yet present source of uncertainty in the posterior estimates, and so will the generated Interpolated distributions.
196+
197+
+++
198+
199+
``````{admonition} An alternative approach
200+
:class: tip
201+
There is an alternative way in `pymc-experimental` trough the function {func}`~pymc_experimental.utils.prior.prior_from_idata` that does something similar. This function:
202+
> Creates a prior from posterior using MvNormal approximation.
203+
> The approximation uses MvNormal distribution. Keep in mind that this function will only work well for unimodal
204+
> posteriors and will fail when complicated interactions happen. Moreover, if a retrieved variable is constrained, you
205+
> should specify a transform for the variable, e.g.
206+
> {func}`~pymc.distributions.transforms.log` for standard deviation posterior.
207+
``````
208+
209+
+++
210+
211+
## Authors
212+
- Created by [David Brochart](https://github.com/davidbrochart) ([pymc#1878](https://github.com/pymc-devs/pymc/pull/1878)) on May 2017.
213+
- Updated by [Juan Orduz](https://github.com/juanitorduz) on August 2024.
214+
165215
```{code-cell} ipython3
166216
%load_ext watermark
167217
%watermark -n -u -v -iv -w
168218
```
219+
220+
:::{include} ../page_footer.md
221+
:::

0 commit comments

Comments
 (0)