references

chiral-carbon · OriolAbril · commit 0bde4f727e21 · 2022-01-09T05:37:12.000+02:00
diff --git a/examples/mixture_models/dp_mix.ipynb b/examples/mixture_models/dp_mix.ipynb
@@ -58,7 +58,8 @@
     "\n",
     "    $$P = \\sum_{i = 1}^\\infty \\mu_i \\delta_{\\omega_i} \\sim \\textrm{DP}(\\alpha, P_0).$$\n",
     "    \n",
-    "[Suggested Further Reading]: (http://mlg.eng.cam.ac.uk/tutorials/07/ywt.pdf) and {cite:p}`teh2010dirichletprocess` for a brief introduction to other flavours of Dirichlet Processes, and their applications.\n",
+    "[Suggested Further Reading]: [A Tutorial on Dirichlet Processes\n",
+    "and Hierarchical Dirichlet Processes by Yee Whye Teh](http://mlg.eng.cam.ac.uk/tutorials/07/ywt.pdf) and {cite:p}`teh2010dirichletprocess` for a brief introduction to other flavours of Dirichlet Processes, and their applications.\n",
     "\n",
     "We can use the stick-breaking process above to easily sample from a Dirichlet process in Python.  For this example, $\\alpha = 2$ and the base distribution is $N(0, 1)$."
    ]
@@ -533,7 +534,7 @@
    "source": [
     "Observant readers will have noted that we have not been continuing the stick-breaking process indefinitely as indicated by its definition, but rather have been truncating this process after a finite number of breaks.  Obviously, when computing with Dirichlet processes, it is necessary to only store a finite number of its point masses and weights in memory.  This restriction is not terribly onerous, since with a finite number of observations, it seems quite likely that the number of mixture components that contribute non-neglible mass to the mixture will grow slower than the number of samples.  This intuition can be formalized to show that the (expected) number of components that contribute non-negligible mass to the mixture approaches $\\alpha \\log N$, where $N$ is the sample size.\n",
     "\n",
-    "There are various clever [Gibbs sampling](https://en.wikipedia.org/wiki/Gibbs_sampling) techniques for Dirichlet processes that allow the number of components stored to grow as needed.  Stochastic memoization {cite:p}`roy2008npbayes` is another powerful technique for simulating Dirichlet processes while only storing finitely many components in memory.  In this introductory example, we take the much less sophistocated approach of simply truncating the Dirichlet process components that are stored after a fixed number, $K$, of components.  [Ohlssen, et al.](http://fisher.osu.edu/~schroeder.9/AMIS900/Ohlssen2006.pdf) provide justification for truncation, showing that $K > 5 \\alpha + 2$ is most likely sufficient to capture almost all of the mixture weight ($\\sum_{i = 1}^{K} w_i > 0.99$).  In practice, we can verify the suitability of our truncated approximation to the Dirichlet process by checking the number of components that contribute non-negligible mass to the mixture.  If, in our simulations, all components contribute non-negligible mass to the mixture, we have truncated the Dirichlet process too early.\n",
+    "There are various clever [Gibbs sampling](https://en.wikipedia.org/wiki/Gibbs_sampling) techniques for Dirichlet processes that allow the number of components stored to grow as needed.  Stochastic memoization {cite:p}`roy2008npbayes` is another powerful technique for simulating Dirichlet processes while only storing finitely many components in memory.  In this introductory example, we take the much less sophistocated approach of simply truncating the Dirichlet process components that are stored after a fixed number, $K$, of components. {cite:t}`ishwaran2002approxdirichlet` provide justification for truncation, showing that $K > 5 \\alpha + 2$ is most likely sufficient to capture almost all of the mixture weight ($\\sum_{i = 1}^{K} w_i > 0.99$).  In practice, we can verify the suitability of our truncated approximation to the Dirichlet process by checking the number of components that contribute non-negligible mass to the mixture.  If, in our simulations, all components contribute non-negligible mass to the mixture, we have truncated the Dirichlet process too early.\n",
     "\n",
     "Our (truncated) Dirichlet process mixture model for the standardized waiting times is\n",
     "\n",
@@ -948,7 +949,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The Dirichlet process mixture model is incredibly flexible in terms of the family of parametric component distributions $\\{f_{\\theta}\\ |\\ f_{\\theta} \\in \\Theta\\}$.  We illustrate this flexibility below by using Poisson component distributions to estimate the density of sunspots per year. This dataset can be downloaded from http://www.sidc.be/silso/datafiles. Source: WDC-SILSO, Royal Observatory of Belgium, Brussels."
+    "The Dirichlet process mixture model is incredibly flexible in terms of the family of parametric component distributions $\\{f_{\\theta}\\ |\\ f_{\\theta} \\in \\Theta\\}$.  We illustrate this flexibility below by using Poisson component distributions to estimate the density of sunspots per year. This dataset was curated by {cite:t}`sidc2021sunspot` can be downloaded."
    ]
   },
   {
diff --git a/examples/references.bib b/examples/references.bib
@@ -94,6 +94,18 @@ @misc{hogg2010data
       primaryClass={astro-ph.IM}
 }
 
+@article{ishwaran2002approxdirichlet,
+  author = {Hemant Ishwaran and Lancelot F James},
+  title = {Approximate Dirichlet Process Computing in Finite Normal Mixtures},
+  journal = {Journal of Computational and Graphical Statistics},
+  volume = {11},
+  number = {3},
+  pages = {508-532},
+  year  = {2002},
+  publisher = {Taylor & Francis},
+  url = {https://doi.org/10.1198/106186002411}
+}
+
 @book{ivezić2014astroMLtext,
   author = {Željko Ivezić and Andrew J. Connolly and Jacob T. VanderPlas and Alexander Gray},
   doi = {10.1515/9781400848911},
@@ -232,6 +244,14 @@ @inproceedings{salakhutdinov2008bayesian
   volume={25}
 }
 
+@article{sidc2021sunspot,
+  author={SILSO World Data Center},
+  address={Royal Observatory of Belgium, Avenue Circulaire 3, 1180 Brussels, Belgium}
+  title={The International Sunspot Number},
+  journal={International Sunspot Number Monthly Bulletin and online catalogue},
+  adsurl={https://wwwbis.sidc.be/silso/datafiles}
+}
+
 @article{silver2016masteringgo,
   title={Mastering the game of Go with deep neural networks and tree search},
   author={D. Silver, A. Huang, C. Maddison et al.},
@@ -252,14 +272,13 @@ @misc{szegedy2014going
 }
 
 @incollection{teh2010dirichletprocess,
-  author = {Y. W. Teh},
-  booktitle = {Encyclopedia of Machine Learning},
-  publisher = {Springer},
-  title = {Dirichlet Processes},
-  year = {2010}
+  author={Y. W. Teh},
+  booktitle={Encyclopedia of Machine Learning},
+  publisher={Springer},
+  title={Dirichlet Processes},
+  year={2010}
 }
 
-
 @INPROCEEDINGS{vanderplas2012astroML,
  author={{Vanderplas}, J.T. and {Connolly}, A.J.
          and {Ivezi{\'c}}, {\v Z}. and {Gray}, A.},