You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/survival_analysis/survival_analysis.myst.md
+32-20Lines changed: 32 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -5,30 +5,36 @@ jupytext:
5
5
format_name: myst
6
6
format_version: 0.13
7
7
kernelspec:
8
-
display_name: Python 3
8
+
display_name: pymc
9
9
language: python
10
10
name: python3
11
11
---
12
12
13
+
(survival_analysis)=
13
14
# Bayesian Survival Analysis
14
15
15
-
Author: Austin Rochford
16
-
17
-
[Survival analysis](https://en.wikipedia.org/wiki/Survival_analysis) studies the distribution of the time to an event. Its applications span many fields across medicine, biology, engineering, and social science. This tutorial shows how to fit and analyze a Bayesian survival model in Python using PyMC3.
16
+
[Survival analysis](https://en.wikipedia.org/wiki/Survival_analysis) studies the distribution of the time to an event. Its applications span many fields across medicine, biology, engineering, and social science. This tutorial shows how to fit and analyze a Bayesian survival model in Python using PyMC.
18
17
19
18
We illustrate these concepts by analyzing a [mastectomy data set](https://vincentarelbundock.github.io/Rdatasets/doc/HSAUR/mastectomy.html) from `R`'s [HSAUR](https://cran.r-project.org/web/packages/HSAUR/index.html) package.
20
19
20
+
:::{post} Jan 17, 2023
21
+
:tags: censored, survival analysis
22
+
:category: intermediate, how-to
23
+
:author: Austin Rochford, Chris Fonnesbeck
24
+
:::
25
+
21
26
```{code-cell} ipython3
22
27
import arviz as az
23
28
import numpy as np
24
29
import pandas as pd
25
-
import pymc3 as pm
26
-
import theano
30
+
import pymc as pm
31
+
import pytensor
27
32
28
-
%matplotlib inline
29
33
from matplotlib import pyplot as plt
30
-
from pymc3.distributions.timeseries import GaussianRandomWalk
31
-
from theano import tensor as T
34
+
from pymc.distributions.timeseries import GaussianRandomWalk
35
+
from pytensor import tensor as T
36
+
37
+
print(f"Running on PyMC v{pm.__version__}")
32
38
```
33
39
34
40
```{code-cell} ipython3
@@ -189,7 +195,7 @@ ax.set_ylabel("Number of observations")
189
195
ax.legend();
190
196
```
191
197
192
-
With the prior distributions on $\beta$ and $\lambda_0(t)$ chosen, we now show how the model may be fit using MCMC simulation with `pymc3`. The key observation is that the piecewise-constant proportional hazard model is [closely related](http://data.princeton.edu/wws509/notes/c7s4.html) to a Poisson regression model. (The models are not identical, but their likelihoods differ by a factor that depends only on the observed data and not the parameters $\beta$ and $\lambda_j$. For details, see Germán Rodríguez's WWS 509 [course notes](http://data.princeton.edu/wws509/notes/c7s4.html).)
198
+
With the prior distributions on $\beta$ and $\lambda_0(t)$ chosen, we now show how the model may be fit using MCMC simulation with `pymc`. The key observation is that the piecewise-constant proportional hazard model is [closely related](http://data.princeton.edu/wws509/notes/c7s4.html) to a Poisson regression model. (The models are not identical, but their likelihoods differ by a factor that depends only on the observed data and not the parameters $\beta$ and $\lambda_j$. For details, see Germán Rodríguez's WWS 509 [course notes](http://data.princeton.edu/wws509/notes/c7s4.html).)
193
199
194
200
We define indicator variables based on whether the $i$-th subject died in the $j$-th interval,
Finally, denote the risk incurred by the $i$-th subject in the $j$-th interval as $\lambda_{i, j} = \lambda_j \exp(\mathbf{x}_i \beta)$.
216
222
217
-
We may approximate $d_{i, j}$ with a Poisson random variable with mean $t_{i, j}\ \lambda_{i, j}$. This approximation leads to the following `pymc3` model.
223
+
We may approximate $d_{i, j}$ with a Poisson random variable with mean $t_{i, j}\ \lambda_{i, j}$. This approximation leads to the following `pymc` model.
We see that the cumulative hazard for metastasized subjects increases more rapidly initially (through about seventy months), after which it increases roughly in parallel with the baseline cumulative hazard.
328
333
329
-
These plots also show the pointwise 95% high posterior density interval for each function. One of the distinct advantages of the Bayesian model fit with `pymc3` is the inherent quantification of uncertainty in our estimates.
334
+
These plots also show the pointwise 95% high posterior density interval for each function. One of the distinct advantages of the Bayesian model fit with `pymc` is the inherent quantification of uncertainty in our estimates.
330
335
331
336
+++
332
337
333
338
##### Time varying effects
334
339
335
340
Another of the advantages of the model we have built is its flexibility. From the plots above, we may reasonable believe that the additional hazard due to metastization varies over time; it seems plausible that cancer that has metastasized increases the hazard rate immediately after the mastectomy, but that the risk due to metastization decreases over time. We can accommodate this mechanism in our model by allowing the regression coefficients to vary over time. In the time-varying coefficient model, if $s_j \leq t < s_{j + 1}$, we let $\lambda(t) = \lambda_j \exp(\mathbf{x} \beta_j).$ The sequence of regression coefficients $\beta_1, \beta_2, \ldots, \beta_{N - 1}$ form a normal random walk with $\beta_1 \sim N(0, 1)$, $\beta_j\ |\ \beta_{j - 1} \sim N(\beta_{j - 1}, 1)$.
336
341
337
-
We implement this model in `pymc3` as follows.
342
+
We implement this model in `pymc` as follows.
338
343
339
344
```{code-cell} ipython3
340
345
coords = {"intervals": intervals}
341
346
342
347
with pm.Model(coords=coords) as time_varying_model:
@@ -501,11 +506,18 @@ We have really only scratched the surface of both survival analysis and the Baye
501
506
502
507
This tutorial is available as an [IPython](http://ipython.org/) notebook [here](https://gist.github.com/AustinRochford/4c6b07e51a2247d678d6). It is adapted from a blog post that first appeared [here](http://austinrochford.com/posts/2015-10-05-bayes-survival.html).
503
508
509
+
+++
510
+
511
+
## Authors
512
+
513
+
- Originally authored by [Austin Rochford](https://github.com/AustinRochford).
514
+
- Updated by [Fernando Irarrázaval](https://github.com/cuchoi) in June 2022 to PyMC v4 ([pymc-examples#372](https://github.com/pymc-devs/pymc-examples/pull/372)).
515
+
- Updated by [Chris Fonnesbeck](https://github.com/fonnesbeck) in January 2023 to PyMC v5.
- Originally collated by [Junpeng Lao](https://junpenglao.xyz/) on Apr 21, 2018. See original code [here](https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/65447fdb431c78b15fbeaef51b8c059f46c9e8d6/PyMC3QnA/discourse_1107.ipynb).
176
184
- Authored and ported to Jupyter notebook by [George Ho](https://eigenfoo.xyz/) on Jul 15, 2018.
185
+
- Updated for compatibility with PyMC v5 by Chris Fonnesbeck on Jan 16, 2023.
0 commit comments