Skip to content

Commit 5288e05

Browse files
authored
Updated 3 survival notebooks to v5 (#501)
1 parent 462700a commit 5288e05

File tree

6 files changed

+525
-329
lines changed

6 files changed

+525
-329
lines changed

examples/survival_analysis/censored_data.ipynb

Lines changed: 53 additions & 34 deletions
Large diffs are not rendered by default.

examples/survival_analysis/censored_data.myst.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: pymc_env
8+
display_name: pymc
99
language: python
10-
name: pymc_env
10+
name: python3
1111
---
1212

1313
(censored_data)=
@@ -241,3 +241,6 @@ As we can see, both censored models appear to capture the mean and variance of t
241241
%load_ext watermark
242242
%watermark -n -u -v -iv -w -p pytensor,aeppl
243243
```
244+
245+
:::{include} ../page_footer.md
246+
:::

examples/survival_analysis/survival_analysis.ipynb

Lines changed: 182 additions & 94 deletions
Large diffs are not rendered by default.

examples/survival_analysis/survival_analysis.myst.md

Lines changed: 32 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -5,30 +5,36 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: Python 3
8+
display_name: pymc
99
language: python
1010
name: python3
1111
---
1212

13+
(survival_analysis)=
1314
# Bayesian Survival Analysis
1415

15-
Author: Austin Rochford
16-
17-
[Survival analysis](https://en.wikipedia.org/wiki/Survival_analysis) studies the distribution of the time to an event. Its applications span many fields across medicine, biology, engineering, and social science. This tutorial shows how to fit and analyze a Bayesian survival model in Python using PyMC3.
16+
[Survival analysis](https://en.wikipedia.org/wiki/Survival_analysis) studies the distribution of the time to an event. Its applications span many fields across medicine, biology, engineering, and social science. This tutorial shows how to fit and analyze a Bayesian survival model in Python using PyMC.
1817

1918
We illustrate these concepts by analyzing a [mastectomy data set](https://vincentarelbundock.github.io/Rdatasets/doc/HSAUR/mastectomy.html) from `R`'s [HSAUR](https://cran.r-project.org/web/packages/HSAUR/index.html) package.
2019

20+
:::{post} Jan 17, 2023
21+
:tags: censored, survival analysis
22+
:category: intermediate, how-to
23+
:author: Austin Rochford, Chris Fonnesbeck
24+
:::
25+
2126
```{code-cell} ipython3
2227
import arviz as az
2328
import numpy as np
2429
import pandas as pd
25-
import pymc3 as pm
26-
import theano
30+
import pymc as pm
31+
import pytensor
2732
28-
%matplotlib inline
2933
from matplotlib import pyplot as plt
30-
from pymc3.distributions.timeseries import GaussianRandomWalk
31-
from theano import tensor as T
34+
from pymc.distributions.timeseries import GaussianRandomWalk
35+
from pytensor import tensor as T
36+
37+
print(f"Running on PyMC v{pm.__version__}")
3238
```
3339

3440
```{code-cell} ipython3
@@ -189,7 +195,7 @@ ax.set_ylabel("Number of observations")
189195
ax.legend();
190196
```
191197

192-
With the prior distributions on $\beta$ and $\lambda_0(t)$ chosen, we now show how the model may be fit using MCMC simulation with `pymc3`. The key observation is that the piecewise-constant proportional hazard model is [closely related](http://data.princeton.edu/wws509/notes/c7s4.html) to a Poisson regression model. (The models are not identical, but their likelihoods differ by a factor that depends only on the observed data and not the parameters $\beta$ and $\lambda_j$. For details, see Germán Rodríguez's WWS 509 [course notes](http://data.princeton.edu/wws509/notes/c7s4.html).)
198+
With the prior distributions on $\beta$ and $\lambda_0(t)$ chosen, we now show how the model may be fit using MCMC simulation with `pymc`. The key observation is that the piecewise-constant proportional hazard model is [closely related](http://data.princeton.edu/wws509/notes/c7s4.html) to a Poisson regression model. (The models are not identical, but their likelihoods differ by a factor that depends only on the observed data and not the parameters $\beta$ and $\lambda_j$. For details, see Germán Rodríguez's WWS 509 [course notes](http://data.princeton.edu/wws509/notes/c7s4.html).)
193199

194200
We define indicator variables based on whether the $i$-th subject died in the $j$-th interval,
195201

@@ -214,7 +220,7 @@ exposure[patients, last_period] = df.time - interval_bounds[last_period]
214220

215221
Finally, denote the risk incurred by the $i$-th subject in the $j$-th interval as $\lambda_{i, j} = \lambda_j \exp(\mathbf{x}_i \beta)$.
216222

217-
We may approximate $d_{i, j}$ with a Poisson random variable with mean $t_{i, j}\ \lambda_{i, j}$. This approximation leads to the following `pymc3` model.
223+
We may approximate $d_{i, j}$ with a Poisson random variable with mean $t_{i, j}\ \lambda_{i, j}$. This approximation leads to the following `pymc` model.
218224

219225
```{code-cell} ipython3
220226
coords = {"intervals": intervals}
@@ -244,7 +250,6 @@ with model:
244250
n_samples,
245251
tune=n_tune,
246252
target_accept=0.99,
247-
return_inferencedata=True,
248253
random_seed=RANDOM_SEED,
249254
)
250255
```
@@ -326,23 +331,23 @@ fig.suptitle("Bayesian survival model");
326331

327332
We see that the cumulative hazard for metastasized subjects increases more rapidly initially (through about seventy months), after which it increases roughly in parallel with the baseline cumulative hazard.
328333

329-
These plots also show the pointwise 95% high posterior density interval for each function. One of the distinct advantages of the Bayesian model fit with `pymc3` is the inherent quantification of uncertainty in our estimates.
334+
These plots also show the pointwise 95% high posterior density interval for each function. One of the distinct advantages of the Bayesian model fit with `pymc` is the inherent quantification of uncertainty in our estimates.
330335

331336
+++
332337

333338
##### Time varying effects
334339

335340
Another of the advantages of the model we have built is its flexibility. From the plots above, we may reasonable believe that the additional hazard due to metastization varies over time; it seems plausible that cancer that has metastasized increases the hazard rate immediately after the mastectomy, but that the risk due to metastization decreases over time. We can accommodate this mechanism in our model by allowing the regression coefficients to vary over time. In the time-varying coefficient model, if $s_j \leq t < s_{j + 1}$, we let $\lambda(t) = \lambda_j \exp(\mathbf{x} \beta_j).$ The sequence of regression coefficients $\beta_1, \beta_2, \ldots, \beta_{N - 1}$ form a normal random walk with $\beta_1 \sim N(0, 1)$, $\beta_j\ |\ \beta_{j - 1} \sim N(\beta_{j - 1}, 1)$.
336341

337-
We implement this model in `pymc3` as follows.
342+
We implement this model in `pymc` as follows.
338343

339344
```{code-cell} ipython3
340345
coords = {"intervals": intervals}
341346
342347
with pm.Model(coords=coords) as time_varying_model:
343348
344349
lambda0 = pm.Gamma("lambda0", 0.01, 0.01, dims="intervals")
345-
beta = GaussianRandomWalk("beta", tau=1.0, dims="intervals")
350+
beta = GaussianRandomWalk("beta", init_dist=pm.Normal.dist(), sigma=1.0, dims="intervals")
346351
347352
lambda_ = pm.Deterministic("h", lambda0 * T.exp(T.outer(T.constant(df.metastasized), beta)))
348353
mu = pm.Deterministic("mu", exposure * lambda_)
@@ -384,15 +389,15 @@ ax.step(interval_bounds[:-1], beta_hat, color="C0")
384389
385390
ax.scatter(
386391
interval_bounds[last_period[(df.event.values == 1) & (df.metastasized == 1)]],
387-
beta_hpt.isel(intervals=last_period[(df.event.values == 1) & (df.metastasized == 1)]),
392+
beta_hat.isel(intervals=last_period[(df.event.values == 1) & (df.metastasized == 1)]),
388393
color="C1",
389394
zorder=10,
390395
label="Died, cancer metastasized",
391396
)
392397
393398
ax.scatter(
394399
interval_bounds[last_period[(df.event.values == 0) & (df.metastasized == 1)]],
395-
beta_hpt.isel(intervals=last_period[(df.event.values == 0) & (df.metastasized == 1)]),
400+
beta_hat.isel(intervals=last_period[(df.event.values == 0) & (df.metastasized == 1)]),
396401
color="C0",
397402
zorder=10,
398403
label="Censored, cancer metastasized",
@@ -501,11 +506,18 @@ We have really only scratched the surface of both survival analysis and the Baye
501506

502507
This tutorial is available as an [IPython](http://ipython.org/) notebook [here](https://gist.github.com/AustinRochford/4c6b07e51a2247d678d6). It is adapted from a blog post that first appeared [here](http://austinrochford.com/posts/2015-10-05-bayes-survival.html).
503508

509+
+++
510+
511+
## Authors
512+
513+
- Originally authored by [Austin Rochford](https://github.com/AustinRochford).
514+
- Updated by [Fernando Irarrázaval](https://github.com/cuchoi) in June 2022 to PyMC v4 ([pymc-examples#372](https://github.com/pymc-devs/pymc-examples/pull/372)).
515+
- Updated by [Chris Fonnesbeck](https://github.com/fonnesbeck) in January 2023 to PyMC v5.
516+
504517
```{code-cell} ipython3
505518
%load_ext watermark
506519
%watermark -n -u -v -iv -w -p xarray
507520
```
508521

509-
```{code-cell} ipython3
510-
511-
```
522+
:::{include} ../page_footer.md
523+
:::

examples/survival_analysis/weibull_aft.ipynb

Lines changed: 230 additions & 168 deletions
Large diffs are not rendered by default.

examples/survival_analysis/weibull_aft.myst.md

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,29 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: Python 3
8+
display_name: pymc
99
language: python
1010
name: python3
1111
---
1212

13+
(weibull_aft)=
14+
1315
# Reparameterizing the Weibull Accelerated Failure Time Model
1416

17+
:::{post} January 17, 2023
18+
:tags: censored, survival analysis, weibull
19+
:category: intermediate, how-to
20+
:author: Junpeng Lao, George Ho, Chris Fonnesbeck
21+
:::
22+
1523
```{code-cell} ipython3
1624
import arviz as az
1725
import numpy as np
18-
import pymc3 as pm
26+
import pymc as pm
27+
import pytensor.tensor as pt
1928
import statsmodels.api as sm
20-
import theano.tensor as tt
2129
22-
print(f"Running on PyMC3 v{pm.__version__}")
30+
print(f"Running on PyMC v{pm.__version__}")
2331
```
2432

2533
```{code-cell} ipython3
@@ -83,8 +91,8 @@ with pm.Model() as model_1:
8391
8492
mu = pm.Normal("mu", mu=0, sigma=100)
8593
alpha_raw = pm.Normal("a0", mu=0, sigma=0.1)
86-
alpha = pm.Deterministic("alpha", tt.exp(alpha_sd * alpha_raw))
87-
beta = pm.Deterministic("beta", tt.exp(mu / alpha))
94+
alpha = pm.Deterministic("alpha", pt.exp(alpha_sd * alpha_raw))
95+
beta = pm.Deterministic("beta", pt.exp(mu / alpha))
8896
8997
y_obs = pm.Weibull("y_obs", alpha=alpha, beta=beta, observed=y[~censored])
9098
y_cens = pm.Potential("y_cens", weibull_lccdf(y[censored], alpha, beta))
@@ -93,7 +101,7 @@ with pm.Model() as model_1:
93101
```{code-cell} ipython3
94102
with model_1:
95103
# Change init to avoid divergences
96-
data_1 = pm.sample(target_accept=0.9, init="adapt_diag", return_inferencedata=True)
104+
data_1 = pm.sample(target_accept=0.9, init="adapt_diag")
97105
```
98106

99107
```{code-cell} ipython3
@@ -114,7 +122,7 @@ For more information, see [this Stan example model](https://github.com/stan-dev/
114122
with pm.Model() as model_2:
115123
alpha = pm.Normal("alpha", mu=0, sigma=10)
116124
r = pm.Gamma("r", alpha=1, beta=0.001, testval=0.25)
117-
beta = pm.Deterministic("beta", tt.exp(-alpha / r))
125+
beta = pm.Deterministic("beta", pt.exp(-alpha / r))
118126
119127
y_obs = pm.Weibull("y_obs", alpha=r, beta=beta, observed=y[~censored])
120128
y_cens = pm.Potential("y_cens", weibull_lccdf(y[censored], r, beta))
@@ -123,7 +131,7 @@ with pm.Model() as model_2:
123131
```{code-cell} ipython3
124132
with model_2:
125133
# Increase target_accept to avoid divergences
126-
data_2 = pm.sample(target_accept=0.9, return_inferencedata=True)
134+
data_2 = pm.sample(target_accept=0.9)
127135
```
128136

129137
```{code-cell} ipython3
@@ -144,7 +152,7 @@ logtime = np.log(y)
144152
145153
def gumbel_sf(y, mu, sigma):
146154
"""Gumbel survival function."""
147-
return 1.0 - tt.exp(-tt.exp(-(y - mu) / sigma))
155+
return 1.0 - pt.exp(-pt.exp(-(y - mu) / sigma))
148156
```
149157

150158
```{code-cell} ipython3
@@ -159,7 +167,7 @@ with pm.Model() as model_3:
159167
```{code-cell} ipython3
160168
with model_3:
161169
# Change init to avoid divergences
162-
data_3 = pm.sample(init="adapt_diag", return_inferencedata=True)
170+
data_3 = pm.sample(init="adapt_diag")
163171
```
164172

165173
```{code-cell} ipython3
@@ -174,8 +182,12 @@ az.summary(data_3, round_to=2)
174182

175183
- Originally collated by [Junpeng Lao](https://junpenglao.xyz/) on Apr 21, 2018. See original code [here](https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/65447fdb431c78b15fbeaef51b8c059f46c9e8d6/PyMC3QnA/discourse_1107.ipynb).
176184
- Authored and ported to Jupyter notebook by [George Ho](https://eigenfoo.xyz/) on Jul 15, 2018.
185+
- Updated for compatibility with PyMC v5 by Chris Fonnesbeck on Jan 16, 2023.
177186

178187
```{code-cell} ipython3
179188
%load_ext watermark
180189
%watermark -n -u -v -iv -w
181190
```
191+
192+
:::{include} ../page_footer.md
193+
:::

0 commit comments

Comments
 (0)