You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"[Suggested Further Reading]: (http://mlg.eng.cam.ac.uk/tutorials/07/ywt.pdf) and {cite:p}`teh2010dirichletprocess` for a brief introduction to other flavours of Dirichlet Processes, and their applications.\n",
61
+
"[Suggested Further Reading]: [A Tutorial on Dirichlet Processes\n",
62
+
"and Hierarchical Dirichlet Processes by Yee Whye Teh](http://mlg.eng.cam.ac.uk/tutorials/07/ywt.pdf) and {cite:p}`teh2010dirichletprocess` for a brief introduction to other flavours of Dirichlet Processes, and their applications.\n",
62
63
"\n",
63
64
"We can use the stick-breaking process above to easily sample from a Dirichlet process in Python. For this example, $\\alpha = 2$ and the base distribution is $N(0, 1)$."
64
65
]
@@ -533,7 +534,7 @@
533
534
"source": [
534
535
"Observant readers will have noted that we have not been continuing the stick-breaking process indefinitely as indicated by its definition, but rather have been truncating this process after a finite number of breaks. Obviously, when computing with Dirichlet processes, it is necessary to only store a finite number of its point masses and weights in memory. This restriction is not terribly onerous, since with a finite number of observations, it seems quite likely that the number of mixture components that contribute non-neglible mass to the mixture will grow slower than the number of samples. This intuition can be formalized to show that the (expected) number of components that contribute non-negligible mass to the mixture approaches $\\alpha \\log N$, where $N$ is the sample size.\n",
535
536
"\n",
536
-
"There are various clever [Gibbs sampling](https://en.wikipedia.org/wiki/Gibbs_sampling) techniques for Dirichlet processes that allow the number of components stored to grow as needed. Stochastic memoization {cite:p}`roy2008npbayes` is another powerful technique for simulating Dirichlet processes while only storing finitely many components in memory. In this introductory example, we take the much less sophistocated approach of simply truncating the Dirichlet process components that are stored after a fixed number, $K$, of components. [Ohlssen, et al.](http://fisher.osu.edu/~schroeder.9/AMIS900/Ohlssen2006.pdf) provide justification for truncation, showing that $K > 5 \\alpha + 2$ is most likely sufficient to capture almost all of the mixture weight ($\\sum_{i = 1}^{K} w_i > 0.99$). In practice, we can verify the suitability of our truncated approximation to the Dirichlet process by checking the number of components that contribute non-negligible mass to the mixture. If, in our simulations, all components contribute non-negligible mass to the mixture, we have truncated the Dirichlet process too early.\n",
537
+
"There are various clever [Gibbs sampling](https://en.wikipedia.org/wiki/Gibbs_sampling) techniques for Dirichlet processes that allow the number of components stored to grow as needed. Stochastic memoization {cite:p}`roy2008npbayes` is another powerful technique for simulating Dirichlet processes while only storing finitely many components in memory. In this introductory example, we take the much less sophistocated approach of simply truncating the Dirichlet process components that are stored after a fixed number, $K$, of components. {cite:t}`ishwaran2002approxdirichlet` provide justification for truncation, showing that $K > 5 \\alpha + 2$ is most likely sufficient to capture almost all of the mixture weight ($\\sum_{i = 1}^{K} w_i > 0.99$). In practice, we can verify the suitability of our truncated approximation to the Dirichlet process by checking the number of components that contribute non-negligible mass to the mixture. If, in our simulations, all components contribute non-negligible mass to the mixture, we have truncated the Dirichlet process too early.\n",
537
538
"\n",
538
539
"Our (truncated) Dirichlet process mixture model for the standardized waiting times is\n",
539
540
"\n",
@@ -948,7 +949,7 @@
948
949
"cell_type": "markdown",
949
950
"metadata": {},
950
951
"source": [
951
-
"The Dirichlet process mixture model is incredibly flexible in terms of the family of parametric component distributions $\\{f_{\\theta}\\ |\\ f_{\\theta} \\in \\Theta\\}$. We illustrate this flexibility below by using Poisson component distributions to estimate the density of sunspots per year. This dataset can be downloaded from http://www.sidc.be/silso/datafiles. Source: WDC-SILSO, Royal Observatory of Belgium, Brussels."
952
+
"The Dirichlet process mixture model is incredibly flexible in terms of the family of parametric component distributions $\\{f_{\\theta}\\ |\\ f_{\\theta} \\in \\Theta\\}$. We illustrate this flexibility below by using Poisson component distributions to estimate the density of sunspots per year. This dataset was curated by {cite:t}`sidc2021sunspot` can be downloaded."
0 commit comments