Skip to content

Commit 211420c

Browse files
committed
Pre-commit fixes
1 parent 6b81693 commit 211420c

File tree

2 files changed

+69
-74
lines changed

2 files changed

+69
-74
lines changed

examples/variational_inference/bayesian_neural_network_advi.ipynb

Lines changed: 50 additions & 46 deletions
Large diffs are not rendered by default.

examples/variational_inference/bayesian_neural_network_advi.myst.md

Lines changed: 19 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: Python 3 (ipykernel)
8+
display_name: Python 3.9.10 ('pymc-dev-py39')
99
language: python
1010
name: python3
1111
---
@@ -15,8 +15,8 @@ kernelspec:
1515

1616
+++
1717

18-
:::{post} May 30, 2022
19-
:tags: neural networks, perceptron, variational inference, minibatch
18+
:::{post} Apr 25, 2022
19+
:tags: pymc.ADVI, pymc.Bernoulli, pymc.Data, pymc.Minibatch, pymc.Model, pymc.Normal, variational inference
2020
:category: intermediate
2121
:author: Thomas Wiecki, updated by Chris Fonnesbeck
2222
:::
@@ -28,13 +28,13 @@ kernelspec:
2828
**Probabilistic Programming**, **Deep Learning** and "**Big Data**" are among the biggest topics in machine learning. Inside of PP, a lot of innovation is focused on making things scale using **Variational Inference**. In this example, I will show how to use **Variational Inference** in PyMC to fit a simple Bayesian Neural Network. I will also discuss how bridging Probabilistic Programming and Deep Learning can open up very interesting avenues to explore in future research.
2929

3030
### Probabilistic Programming at scale
31-
**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using {ref}`MCMC sampling algorithms <multilevel_modeling>` we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in several probabilistic programming packages including PyMC, NumPyro and Stan.
31+
**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using [MCMC sampling algorithms](http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/) we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in PyMC, NumPyro and Stan.
3232

3333
Unfortunately, when it comes to traditional ML problems like classification or (non-linear) regression, Probabilistic Programming often plays second fiddle (in terms of accuracy and scalability) to more algorithmic approaches like [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) (e.g. [random forests](https://en.wikipedia.org/wiki/Random_forest) or [gradient boosted regression trees](https://en.wikipedia.org/wiki/Boosting_(machine_learning)).
3434

3535
### Deep Learning
3636

37-
Now in its third renaissance, neural networks have been making headlines repeatedly by dominating almost any object recognition benchmark, kicking ass at Atari games {cite:p}`mnih2013playing`, and beating the world-champion Lee Sedol at Go {cite:p}`silver2016masteringgo`. From a statistical point, Neural Networks are extremely good non-linear function approximators and representation learners. While mostly known for classification, they have been extended to unsupervised learning with AutoEncoders {cite:p}`kingma2014autoencoding` and in all sorts of other interesting ways (e.g. [Recurrent Networks](https://en.wikipedia.org/wiki/Recurrent_neural_network), or [MDNs](http://cbonnett.github.io/MDN_EDWARD_KERAS_TF.html) to estimate multimodal distributions). Why do they work so well? No one really knows as the statistical properties are still not fully understood.
37+
Now in its third renaissance, neural networks have been making headlines repeatadly by dominating almost any object recognition benchmark, kicking ass at Atari games {cite:p}`mnih2013playing`, and beating the world-champion Lee Sedol at Go {cite:p}`silver2016masteringgo`. From a statistical point, Neural Networks are extremely good non-linear function approximators and representation learners. While mostly known for classification, they have been extended to unsupervised learning with AutoEncoders {cite:p}`kingma2014autoencoding` and in all sorts of other interesting ways (e.g. [Recurrent Networks](https://en.wikipedia.org/wiki/Recurrent_neural_network), or [MDNs](http://cbonnett.github.io/MDN_EDWARD_KERAS_TF.html) to estimate multimodal distributions). Why do they work so well? No one really knows as the statistical properties are still not fully understood.
3838

3939
A large part of the innoviation in deep learning is the ability to train these extremely complex models. This rests on several pillars:
4040
* Speed: facilitating the GPU allowed for much faster processing.
@@ -64,11 +64,12 @@ While this would allow Probabilistic Programming to be applied to a much wider s
6464
First, lets generate some toy data -- a simple binary classification problem that's not linearly separable.
6565

6666
```{code-cell} ipython3
67+
import aesara
68+
import aesara.tensor as at
6769
import arviz as az
6870
import matplotlib.pyplot as plt
6971
import numpy as np
7072
import pymc as pm
71-
import pytensor
7273
import seaborn as sns
7374
7475
from sklearn.datasets import make_moons
@@ -78,7 +79,7 @@ from sklearn.preprocessing import scale
7879

7980
```{code-cell} ipython3
8081
%config InlineBackend.figure_format = 'retina'
81-
floatX = pytensor.config.floatX
82+
floatX = aesara.config.floatX
8283
RANDOM_SEED = 9927
8384
rng = np.random.default_rng(RANDOM_SEED)
8485
az.style.use("arviz-darkgrid")
@@ -126,16 +127,12 @@ def construct_nn():
126127
"hidden_layer_1": np.arange(n_hidden),
127128
"hidden_layer_2": np.arange(n_hidden),
128129
"train_cols": np.arange(X_train.shape[1]),
129-
"obs_id": np.arange(X_train.shape[0]),
130+
# "obs_id": np.arange(X_train.shape[0]),
130131
}
131132
132133
with pm.Model(coords=coords) as neural_network:
133-
# Define minibatch variables
134-
minibatch_x, minibatch_y = pm.Minibatch(X_train, Y_train, batch_size=50)
135-
136-
# Define data variables using minibatches
137-
ann_input = pm.Data("ann_input", minibatch_x, mutable=True, dims=("obs_id", "train_cols"))
138-
ann_output = pm.Data("ann_output", minibatch_y, mutable=True, dims="obs_id")
134+
ann_input = pm.Data("ann_input", X_train, mutable=True)
135+
ann_output = pm.Data("ann_output", Y_train, mutable=True)
139136
140137
# Weights from input to hidden layer
141138
weights_in_1 = pm.Normal(
@@ -160,8 +157,7 @@ def construct_nn():
160157
"out",
161158
act_out,
162159
observed=ann_output,
163-
total_size=X_train.shape[0], # IMPORTANT for minibatches
164-
dims="obs_id",
160+
total_size=Y_train.shape[0], # IMPORTANT for minibatches
165161
)
166162
return neural_network
167163
@@ -176,9 +172,9 @@ That's not so bad. The `Normal` priors help regularize the weights. Usually we w
176172

177173
### Variational Inference: Scaling model complexity
178174

179-
We could now just run a MCMC sampler like {class}`pymc.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.
175+
We could now just run a MCMC sampler like {class}`~pymc.step_methods.hmc.nuts.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.
180176

181-
Instead, we will use the {class}`pymc.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior.
177+
Instead, we will use the {class}`~pymc.variational.inference.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior.
182178

183179
```{code-cell} ipython3
184180
%%time
@@ -199,7 +195,7 @@ plt.xlabel("iteration");
199195
trace = approx.sample(draws=5000)
200196
```
201197

202-
Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation).
198+
Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sampling.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation).
203199

204200
```{code-cell} ipython3
205201
---
@@ -215,7 +211,7 @@ with neural_network:
215211
We can average the predictions for each observation to estimate the underlying probability of class 1.
216212

217213
```{code-cell} ipython3
218-
pred = ppc.posterior_predictive["out"].mean(("chain", "draw")) > 0.5
214+
pred = ppc.posterior_predictive["out"].squeeze().mean(axis=0) > 0.5
219215
```
220216

221217
```{code-cell} ipython3
@@ -228,7 +224,7 @@ ax.set(title="Predicted labels in testing set", xlabel="X1", ylabel="X2");
228224
```
229225

230226
```{code-cell} ipython3
231-
print(f"Accuracy = {(Y_test == pred.values).mean() * 100:.2f}%")
227+
print(f"Accuracy = {(Y_test == pred.values).mean() * 100}%")
232228
```
233229

234230
Hey, our neural network did all right!
@@ -254,13 +250,8 @@ dummy_out = np.ones(grid_2d.shape[0], dtype=np.int8)
254250
jupyter:
255251
outputs_hidden: true
256252
---
257-
coords_eval = {
258-
"train_cols": np.arange(grid_2d.shape[1]),
259-
"obs_id": np.arange(grid_2d.shape[0]),
260-
}
261-
262253
with neural_network:
263-
pm.set_data(new_data={"ann_input": grid_2d, "ann_output": dummy_out}, coords=coords_eval)
254+
pm.set_data(new_data={"ann_input": grid_2d, "ann_output": dummy_out})
264255
ppc = pm.sample_posterior_predictive(trace)
265256
```
266257

@@ -274,7 +265,7 @@ y_pred = ppc.posterior_predictive["out"]
274265
cmap = sns.diverging_palette(250, 12, s=85, l=25, as_cmap=True)
275266
fig, ax = plt.subplots(figsize=(16, 9))
276267
contour = ax.contourf(
277-
grid[0], grid[1], y_pred.mean(("chain", "draw")).values.reshape(100, 100), cmap=cmap
268+
grid[0], grid[1], y_pred.squeeze().values.mean(axis=0).reshape(100, 100), cmap=cmap
278269
)
279270
ax.scatter(X_test[pred == 0, 0], X_test[pred == 0, 1], color="C0")
280271
ax.scatter(X_test[pred == 1, 0], X_test[pred == 1, 1], color="C1")

0 commit comments

Comments
 (0)