Skip to content

Commit de1cfc2

Browse files
author
juanitorduz
committed
update notebook
1 parent 4515f49 commit de1cfc2

File tree

2 files changed

+239
-577
lines changed

2 files changed

+239
-577
lines changed

examples/howto/updating_priors.ipynb

Lines changed: 159 additions & 522 deletions
Large diffs are not rendered by default.

examples/howto/updating_priors.myst.md

Lines changed: 80 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -10,56 +10,61 @@ kernelspec:
1010
name: python3
1111
---
1212

13-
# Updating priors
13+
(updating-priors)=
14+
# Updating Priors
15+
16+
:::{post} January, 2017
17+
:tags: priors
18+
:category: intermediate, how-to
19+
:::
1420

1521
+++
1622

17-
In this notebook, I will show how it is possible to update the priors as new data becomes available. The example is a slightly modified version of the linear regression in the [Getting started with PyMC3](https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/getting_started.ipynb) notebook.
23+
In this notebook, we will show how it is possible to update the priors as new data becomes available.
1824

1925
```{code-cell} ipython3
20-
%matplotlib inline
21-
import warnings
22-
2326
import arviz as az
2427
import matplotlib as mpl
2528
import matplotlib.pyplot as plt
2629
import numpy as np
27-
import pymc3 as pm
28-
import theano.tensor as tt
30+
import pymc as pm
31+
import pytensor.tensor as pt
2932
30-
from pymc3 import Model, Normal, Slice, sample
31-
from pymc3.distributions import Interpolated
3233
from scipy import stats
33-
from theano import as_op
34+
from tqdm.notebook import trange
35+
36+
az.style.use("arviz-darkgrid")
37+
plt.rcParams["figure.figsize"] = [12, 7]
38+
plt.rcParams["figure.dpi"] = 100
39+
plt.rcParams["figure.facecolor"] = "white"
3440
35-
plt.style.use("seaborn-darkgrid")
36-
print(f"Running on PyMC3 v{pm.__version__}")
41+
%load_ext autoreload
42+
%autoreload 2
43+
%config InlineBackend.figure_format = "retina"
3744
```
3845

3946
```{code-cell} ipython3
40-
warnings.filterwarnings("ignore")
47+
rng: np.random.Generator = np.random.default_rng(seed=42)
4148
```
4249

4350
## Generating data
4451

4552
```{code-cell} ipython3
46-
# Initialize random number generator
47-
np.random.seed(93457)
48-
4953
# True parameter values
5054
alpha_true = 5
5155
beta0_true = 7
5256
beta1_true = 13
57+
sigma_true = 2
5358
5459
# Size of dataset
5560
size = 100
5661
5762
# Predictor variable
58-
X1 = np.random.randn(size)
59-
X2 = np.random.randn(size) * 0.2
63+
X1 = rng.normal(size=size)
64+
X2 = rng.normal(size=size) * 0.2
6065
6166
# Simulate outcome variable
62-
Y = alpha_true + beta0_true * X1 + beta1_true * X2 + np.random.randn(size)
67+
Y = alpha_true + beta0_true * X1 + beta1_true * X2 + rng.normal(size=size, scale=sigma_true)
6368
```
6469

6570
## Model specification
@@ -69,35 +74,48 @@ Y = alpha_true + beta0_true * X1 + beta1_true * X2 + np.random.randn(size)
6974
Our initial beliefs about the parameters are quite informative (sigma=1) and a bit off the true values.
7075

7176
```{code-cell} ipython3
72-
basic_model = Model()
73-
74-
with basic_model:
77+
with pm.Model() as model:
7578
# Priors for unknown model parameters
76-
alpha = Normal("alpha", mu=0, sigma=1)
77-
beta0 = Normal("beta0", mu=12, sigma=1)
78-
beta1 = Normal("beta1", mu=18, sigma=1)
79+
alpha = pm.Normal("alpha", mu=0, sigma=5)
80+
beta0 = pm.Normal("beta0", mu=0, sigma=5)
81+
beta1 = pm.Normal("beta1", mu=0, sigma=5)
82+
83+
sigma = pm.HalfNormal("sigma", sigma=1)
7984
8085
# Expected value of outcome
8186
mu = alpha + beta0 * X1 + beta1 * X2
8287
8388
# Likelihood (sampling distribution) of observations
84-
Y_obs = Normal("Y_obs", mu=mu, sigma=1, observed=Y)
89+
Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=Y)
8590
86-
# draw 1000 posterior samples
87-
trace = sample(1000)
91+
# draw 2_000 posterior samples
92+
trace = pm.sample(
93+
tune=1_500, draws=2_000, target_accept=0.9, progressbar=False, random_seed=rng
94+
)
8895
```
8996

9097
```{code-cell} ipython3
91-
az.plot_trace(trace);
98+
axes = az.plot_trace(
99+
data=trace,
100+
compact=True,
101+
lines=[
102+
("alpha", {}, alpha_true),
103+
("beta0", {}, beta0_true),
104+
("beta1", {}, beta1_true),
105+
("sigma", {}, sigma_true),
106+
],
107+
backend_kwargs={"figsize": (12, 9), "layout": "constrained"},
108+
)
109+
plt.gcf().suptitle("Trace", fontsize=16);
92110
```
93111

94112
In order to update our beliefs about the parameters, we use the posterior distributions, which will be used as the prior distributions for the next inference. The data used for each inference iteration has to be independent from the previous iterations, otherwise the same (possibly wrong) belief is injected over and over in the system, amplifying the errors and misleading the inference. By ensuring the data is independent, the system should converge to the true parameter values.
95113

96-
Because we draw samples from the posterior distribution (shown on the right in the figure above), we need to estimate their probability density (shown on the left in the figure above). [Kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) (KDE) is a way to achieve this, and we will use this technique here. In any case, it is an empirical distribution that cannot be expressed analytically. Fortunately PyMC3 provides a way to use custom distributions, via `Interpolated` class.
114+
Because we draw samples from the posterior distribution (shown on the right in the figure above), we need to estimate their probability density (shown on the left in the figure above). [Kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) (KDE) is a way to achieve this, and we will use this technique here. In any case, it is an empirical distribution that cannot be expressed analytically. Fortunately PyMC provides a way to use custom distributions, via {class}`~pymc.Interpolated` class.
97115

98116
```{code-cell} ipython3
99117
def from_posterior(param, samples):
100-
smin, smax = np.min(samples), np.max(samples)
118+
smin, smax = samples.min().item(), samples.max().item()
101119
width = smax - smin
102120
x = np.linspace(smin, smax, 100)
103121
y = stats.gaussian_kde(samples)(x)
@@ -106,7 +124,7 @@ def from_posterior(param, samples):
106124
# so we'll extend the domain and use linear approximation of density on it
107125
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
108126
y = np.concatenate([[0], y, [0]])
109-
return Interpolated(param, x, y)
127+
return pm.Interpolated(param, x, y)
110128
```
111129

112130
Now we just need to generate more data and build our Bayesian model so that the prior distributions for the current iteration are the posterior distributions from the previous iteration. It is still possible to continue using NUTS sampling method because `Interpolated` class implements calculation of gradients that are necessary for Hamiltonian Monte Carlo samplers.
@@ -116,46 +134,50 @@ traces = [trace]
116134
```
117135

118136
```{code-cell} ipython3
119-
for _ in range(10):
137+
n_iterations = 10
138+
139+
for _ in trange(n_iterations):
120140
# generate more data
121-
X1 = np.random.randn(size)
122-
X2 = np.random.randn(size) * 0.2
123-
Y = alpha_true + beta0_true * X1 + beta1_true * X2 + np.random.randn(size)
141+
X1 = rng.normal(size=size)
142+
X2 = rng.normal(size=size) * 0.2
143+
Y = alpha_true + beta0_true * X1 + beta1_true * X2 + rng.normal(size=size, scale=sigma_true)
124144
125-
model = Model()
126-
with model:
145+
with pm.Model() as model:
127146
# Priors are posteriors from previous iteration
128-
alpha = from_posterior("alpha", trace["alpha"])
129-
beta0 = from_posterior("beta0", trace["beta0"])
130-
beta1 = from_posterior("beta1", trace["beta1"])
147+
alpha = from_posterior("alpha", az.extract(trace, group="posterior", var_names=["alpha"]))
148+
beta0 = from_posterior("beta0", az.extract(trace, group="posterior", var_names=["beta0"]))
149+
beta1 = from_posterior("beta1", az.extract(trace, group="posterior", var_names=["beta1"]))
150+
sigma = from_posterior("sigma", az.extract(trace, group="posterior", var_names=["sigma"]))
131151
132152
# Expected value of outcome
133153
mu = alpha + beta0 * X1 + beta1 * X2
134154
135155
# Likelihood (sampling distribution) of observations
136-
Y_obs = Normal("Y_obs", mu=mu, sigma=1, observed=Y)
156+
Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=Y)
137157
138-
# draw 10000 posterior samples
139-
trace = sample(1000)
158+
# draw 2_000 posterior samples
159+
trace = pm.sample(
160+
tune=1_500, draws=2_000, target_accept=0.9, progressbar=False, random_seed=rng
161+
)
140162
traces.append(trace)
141163
```
142164

143165
```{code-cell} ipython3
144-
print("Posterior distributions after " + str(len(traces)) + " iterations.")
145-
cmap = mpl.cm.autumn
146-
for param in ["alpha", "beta0", "beta1"]:
147-
plt.figure(figsize=(8, 2))
166+
fig, ax = plt.subplots(nrows=4, ncols=1, figsize=(12, 12), sharex=False, sharey=False)
167+
168+
cmap = mpl.cm.viridis
169+
170+
for i, (param, true_value) in enumerate(
171+
zip(["alpha", "beta0", "beta1", "sigma"], [alpha_true, beta0_true, beta1_true, sigma_true])
172+
):
148173
for update_i, trace in enumerate(traces):
149-
samples = trace[param]
174+
samples = az.extract(trace, group="posterior", var_names=param)
150175
smin, smax = np.min(samples), np.max(samples)
151176
x = np.linspace(smin, smax, 100)
152177
y = stats.gaussian_kde(samples)(x)
153-
plt.plot(x, y, color=cmap(1 - update_i / len(traces)))
154-
plt.axvline({"alpha": alpha_true, "beta0": beta0_true, "beta1": beta1_true}[param], c="k")
155-
plt.ylabel("Frequency")
156-
plt.title(param)
157-
158-
plt.tight_layout();
178+
ax[i].plot(x, y, color=cmap(1 - update_i / len(traces)))
179+
ax[i].axvline(true_value, c="k")
180+
ax[i].set(title=param)
159181
```
160182

161183
You can re-execute the last two cells to generate more updates.
@@ -166,3 +188,6 @@ What is interesting to note is that the posterior distributions for our paramete
166188
%load_ext watermark
167189
%watermark -n -u -v -iv -w
168190
```
191+
192+
:::{include} ../page_footer.md
193+
:::

0 commit comments

Comments
 (0)