Skip to content

Commit c4b942f

Browse files
committed
fix citations
1 parent 0e083ba commit c4b942f

File tree

3 files changed

+57
-18
lines changed

3 files changed

+57
-18
lines changed

examples/mixture_models/dirichlet_mixture_of_multinomials.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -104,9 +104,10 @@
104104
"so it is perhaps tautological to fit that model,\n",
105105
"but rest assured that data like these really do appear in\n",
106106
"the counts of different:\n",
107-
"(1) [words in text corpuses](https://doi.org/10.1145/1102351.1102420),\n",
108-
"(2) [types of RNA molecules in a cell](https://doi.org/10.12688/f1000research.8900.2),\n",
109-
"(3) [items purchased by shoppers](https://doi.org/10.2307/2981696).\n",
107+
"\n",
108+
"1. words in text corpuses {cite:p}`madsen2005modelingdirichlet`,\n",
109+
"2. types of RNA molecules in a cell {cite:p}`nowicka2016drimseq`,\n",
110+
"3. items purchased by shoppers {cite:p}`goodhardt1984thedirichlet`.\n",
110111
"\n",
111112
"Here we will discuss a community ecology example, pretending that we have observed counts of $k=5$ different\n",
112113
"tree species in $n=10$ different forests.\n",
@@ -1917,8 +1918,7 @@
19171918
"\n",
19181919
":::{bibliography}\n",
19191920
":filter: docname in docnames\n",
1920-
"- \n",
1921-
":::\n"
1921+
":::"
19221922
]
19231923
},
19241924
{

examples/mixture_models/dp_mix.ipynb

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,6 @@
1212
":::"
1313
]
1414
},
15-
{
16-
"cell_type": "markdown",
17-
"metadata": {
18-
"toc": true
19-
},
20-
"source": [
21-
"<h2>Agenda<span class=\"tocSkip\"></span></h2>\n",
22-
"<div class=\"toc\"><li><span><a href=\"#Dirichlet-processes\" data-toc-modified-id=\"Dirichlet-processes-1\"><span class=\"toc-item-num\">1.&nbsp;&nbsp;</span>Dirichlet processes</a></span></li><li><span><a href=\"#Dirichlet-process-mixtures\" data-toc-modified-id=\"Dirichlet-process-mixtures-2\"><span class=\"toc-item-num\">2.&nbsp;&nbsp;</span>Dirichlet process mixtures</a></span></li></div>"
23-
]
24-
},
2515
{
2616
"cell_type": "markdown",
2717
"metadata": {},
@@ -68,7 +58,7 @@
6858
"\n",
6959
" $$P = \\sum_{i = 1}^\\infty \\mu_i \\delta_{\\omega_i} \\sim \\textrm{DP}(\\alpha, P_0).$$\n",
7060
" \n",
71-
"[Suggested Further Reading]: (http://mlg.eng.cam.ac.uk/tutorials/07/ywt.pdf) and (https://www.stats.ox.ac.uk/~teh/research/npbayes/Teh2010a.pdf) for a brief introduction to other flavours of Dirichlet Processes, and their applications.\n",
61+
"[Suggested Further Reading]: (http://mlg.eng.cam.ac.uk/tutorials/07/ywt.pdf) and {cite:p}`teh2010dirichletprocess` for a brief introduction to other flavours of Dirichlet Processes, and their applications.\n",
7262
"\n",
7363
"We can use the stick-breaking process above to easily sample from a Dirichlet process in Python. For this example, $\\alpha = 2$ and the base distribution is $N(0, 1)$."
7464
]
@@ -543,7 +533,7 @@
543533
"source": [
544534
"Observant readers will have noted that we have not been continuing the stick-breaking process indefinitely as indicated by its definition, but rather have been truncating this process after a finite number of breaks. Obviously, when computing with Dirichlet processes, it is necessary to only store a finite number of its point masses and weights in memory. This restriction is not terribly onerous, since with a finite number of observations, it seems quite likely that the number of mixture components that contribute non-neglible mass to the mixture will grow slower than the number of samples. This intuition can be formalized to show that the (expected) number of components that contribute non-negligible mass to the mixture approaches $\\alpha \\log N$, where $N$ is the sample size.\n",
545535
"\n",
546-
"There are various clever [Gibbs sampling](https://en.wikipedia.org/wiki/Gibbs_sampling) techniques for Dirichlet processes that allow the number of components stored to grow as needed. [Stochastic memoization](http://danroy.org/papers/RoyManGooTen-ICMLNPB-2008.pdf) is another powerful technique for simulating Dirichlet processes while only storing finitely many components in memory. In this introductory example, we take the much less sophistocated approach of simply truncating the Dirichlet process components that are stored after a fixed number, $K$, of components. [Ohlssen, et al.](http://fisher.osu.edu/~schroeder.9/AMIS900/Ohlssen2006.pdf) provide justification for truncation, showing that $K > 5 \\alpha + 2$ is most likely sufficient to capture almost all of the mixture weight ($\\sum_{i = 1}^{K} w_i > 0.99$). In practice, we can verify the suitability of our truncated approximation to the Dirichlet process by checking the number of components that contribute non-negligible mass to the mixture. If, in our simulations, all components contribute non-negligible mass to the mixture, we have truncated the Dirichlet process too early.\n",
536+
"There are various clever [Gibbs sampling](https://en.wikipedia.org/wiki/Gibbs_sampling) techniques for Dirichlet processes that allow the number of components stored to grow as needed. Stochastic memoization {cite:p}`roy2008npbayes` is another powerful technique for simulating Dirichlet processes while only storing finitely many components in memory. In this introductory example, we take the much less sophistocated approach of simply truncating the Dirichlet process components that are stored after a fixed number, $K$, of components. [Ohlssen, et al.](http://fisher.osu.edu/~schroeder.9/AMIS900/Ohlssen2006.pdf) provide justification for truncation, showing that $K > 5 \\alpha + 2$ is most likely sufficient to capture almost all of the mixture weight ($\\sum_{i = 1}^{K} w_i > 0.99$). In practice, we can verify the suitability of our truncated approximation to the Dirichlet process by checking the number of components that contribute non-negligible mass to the mixture. If, in our simulations, all components contribute non-negligible mass to the mixture, we have truncated the Dirichlet process too early.\n",
547537
"\n",
548538
"Our (truncated) Dirichlet process mixture model for the standardized waiting times is\n",
549539
"\n",
@@ -1440,7 +1430,10 @@
14401430
"metadata": {},
14411431
"source": [
14421432
"## References\n",
1443-
"\n"
1433+
"\n",
1434+
":::{bibliography}\n",
1435+
":filter: docname in docnames\n",
1436+
":::"
14441437
]
14451438
},
14461439
{

examples/references.bib

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,26 @@ @article{gelman2006multilevel
1616
publisher={Taylor \& Francis}
1717
}
1818

19+
@misc{goodhardt1984thedirichlet,
20+
title={The Dirichlet: A Comprehensive Model of Buying Behaviour },
21+
author={G. J. Goodhardt, A. S. C. Ehrenberg and C. Chatfield },
22+
url={https://www.jstor.org/stable/2981696},
23+
journal={Journal of the Royal Statistical Society. Series A (General)},
24+
volume={147},
25+
number={3},
26+
pages={621--655},
27+
year={1984}
28+
}
29+
30+
@misc{madsen2005modelingdirichlet,
31+
title={Modeling word burstiness using the Dirichlet distribution},
32+
url={https://dl.acm.org/doi/10.1145/1102351.1102420},
33+
journal={Proceedings of the 22nd international conference on Machine learning},
34+
author={Rasmus E. Madsen, David Kauchak, Charles Elkan, Jozef Stefan and et al.},
35+
year={2005},
36+
month={Aug}
37+
}
38+
1939
@book{mcelreath2018statistical,
2040
title={Statistical rethinking: A Bayesian course with examples in R and Stan},
2141
author={McElreath, Richard},
@@ -24,3 +44,29 @@ @book{mcelreath2018statistical
2444
}
2545

2646

47+
@misc{nowicka2016drimseq,
48+
title={DRIMSeq: A Dirichlet-Multinomial Framework for Multivariate Count Outcomes in Genomics},
49+
url={https://f1000research.com/articles/5-1356/v2},
50+
journal={F1000Research},
51+
author={Nowicka, Malgorzata and Robinson, Mark D.},
52+
year={2016},
53+
month={Dec}
54+
}
55+
56+
@inproceedings{roy2008npbayes,
57+
title={A stochastic programming perspective on nonparametric Bayes},
58+
author={Daniel M. Roy, Vikash Mansinghka, Noah Goodman, and Joshua Tenenbaum},
59+
journal={International Conference on Machine Learning: Workshop on Nonparametric Bayesian},
60+
year={2008},
61+
url={http://danroy.org/papers/RoyManGooTen-ICMLNPB-2008.pdf}
62+
}
63+
64+
@incollection{teh2010dirichletprocess,
65+
author = {Y. W. Teh},
66+
booktitle = {Encyclopedia of Machine Learning},
67+
publisher = {Springer},
68+
title = {Dirichlet Processes},
69+
year = {2010}
70+
}
71+
72+

0 commit comments

Comments
 (0)