referencing corrections

chiral-carbon · chiral-carbon · commit 19d7ebe224f8 · 2021-10-09T12:37:44.000+05:30
diff --git a/examples/case_studies/probabilistic_matrix_factorization.ipynb b/examples/case_studies/probabilistic_matrix_factorization.ipynb
@@ -85,7 +85,7 @@
    "source": [
     "## Data\n",
     "\n",
-    "The {cite:t}`harper2015movielens` was collected by the GroupLens Research Project at the University of Minnesota. This data set consists of 100,000 ratings (1-5) from 943 users on 1682 movies. Each user rated at least 20 movies, and be have basic information on the users (age, gender, occupation, zip). Each movie includes basic information like title, release date, video release date, and genre. We will implement a model that is suitable for collaborative filtering on this data and evaluate it in terms of root mean squared error (RMSE) to validate the results.\n",
+    "The MovieLens 100k dataset {cite:p}`harper2015movielens` was collected by the GroupLens Research Project at the University of Minnesota. This data set consists of 100,000 ratings (1-5) from 943 users on 1682 movies. Each user rated at least 20 movies, and be have basic information on the users (age, gender, occupation, zip). Each movie includes basic information like title, release date, video release date, and genre. We will implement a model that is suitable for collaborative filtering on this data and evaluate it in terms of root mean squared error (RMSE) to validate the results.\n",
     "\n",
     "The data was collected through the [MovieLens website](https://movielens.org/) during the seven-month period from September 19th,\n",
     "1997 through April 22nd, 1998. This data has been cleaned up - users\n",
@@ -795,7 +795,7 @@
    "source": [
     "## Probabilistic Matrix Factorization\n",
     "\n",
-    "{cite:t}`mnih2008advances` is a probabilistic approach to the collaborative filtering problem that takes a Bayesian perspective. The ratings $R$ are modeled as draws from a Gaussian distribution.  The mean for $R_{ij}$ is $U_i V_j^T$. The precision $\\alpha$ is a fixed parameter that reflects the uncertainty of the estimations; the normal distribution is commonly reparameterized in terms of precision, which is the inverse of the variance. Complexity is controlled by placing zero-mean spherical Gaussian priors on $U$ and $V$. In other words, each row of $U$ is drawn from a multivariate Gaussian with mean $\\mu = 0$ and precision which is some multiple of the identity matrix $I$. Those multiples are $\\alpha_U$ for $U$ and $\\alpha_V$ for $V$. So our model is defined by:\n",
+    "Probabilistic Matrix Factorization {cite:p}`mnih2008advances` is a probabilistic approach to the collaborative filtering problem that takes a Bayesian perspective. The ratings $R$ are modeled as draws from a Gaussian distribution.  The mean for $R_{ij}$ is $U_i V_j^T$. The precision $\\alpha$ is a fixed parameter that reflects the uncertainty of the estimations; the normal distribution is commonly reparameterized in terms of precision, which is the inverse of the variance. Complexity is controlled by placing zero-mean spherical Gaussian priors on $U$ and $V$. In other words, each row of $U$ is drawn from a multivariate Gaussian with mean $\\mu = 0$ and precision which is some multiple of the identity matrix $I$. Those multiples are $\\alpha_U$ for $U$ and $\\alpha_V$ for $V$. So our model is defined by:\n",
     "\n",
     "$\\newcommand\\given[1][]{\\:#1\\vert\\:}$\n",
     "\n",
@@ -916,7 +916,7 @@
     "\n",
     "$$ E = \\frac{1}{2} \\sum_{i=1}^N \\sum_{j=1}^M I_{ij} (R_{ij} - U_i V_j^T)^2 + \\frac{\\lambda_U}{2} \\sum_{i=1}^N \\|U\\|_{Fro}^2 + \\frac{\\lambda_V}{2} \\sum_{j=1}^M \\|V\\|_{Fro}^2, $$\n",
     "\n",
-    "where $\\lambda_U = \\alpha_U / \\alpha$, $\\lambda_V = \\alpha_V / \\alpha$, and $\\|\\cdot\\|_{Fro}^2$ denotes the Frobenius norm [3]. Minimizing this objective function gives a local minimum, which is essentially a maximum a posteriori (MAP) estimate. While it is possible to use a fast Stochastic Gradient Descent procedure to find this MAP, we'll be finding it using the utilities built into `pymc3`. In particular, we'll use `find_MAP` with Powell optimization (`scipy.optimize.fmin_powell`). Having found this MAP estimate, we can use it as our starting point for MCMC sampling.\n",
+    "where $\\lambda_U = \\alpha_U / \\alpha$, $\\lambda_V = \\alpha_V / \\alpha$, and $\\|\\cdot\\|_{Fro}^2$ denotes the Frobenius norm {cite:p}`mnih2008advances`. Minimizing this objective function gives a local minimum, which is essentially a maximum a posteriori (MAP) estimate. While it is possible to use a fast Stochastic Gradient Descent procedure to find this MAP, we'll be finding it using the utilities built into `pymc3`. In particular, we'll use `find_MAP` with Powell optimization (`scipy.optimize.fmin_powell`). Having found this MAP estimate, we can use it as our starting point for MCMC sampling.\n",
     "\n",
     "Since it is a reasonably complex model, we expect the MAP estimation to take some time. So let's save it after we've found it. Note that we define a function for finding the MAP below, assuming it will receive a namespace with some variables in it. Then we attach that function to the PMF class, where it will have such a namespace after initialization. The PMF class is defined in pieces this way so I can say a few things between each piece to make it clearer."
    ]