diff --git a/examples/generalized_linear_models/GLM-out-of-sample-predictions.ipynb b/examples/generalized_linear_models/GLM-out-of-sample-predictions.ipynb new file mode 100644 index 000000000..d02790046 --- /dev/null +++ b/examples/generalized_linear_models/GLM-out-of-sample-predictions.ipynb @@ -0,0 +1,655 @@ +{ + "metadata": { + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.6-final" + }, + "orig_nbformat": 2, + "kernelspec": { + "name": "python3", + "display_name": "Python 3.7.6 64-bit ('website_projects': conda)", + "metadata": { + "interpreter": { + "hash": "fbddea5140024843998ae64bf59a7579a9660d103062604797ea5984366c686c" + } + } + } + }, + "nbformat": 4, + "nbformat_minor": 2, + "cells": [ + { + "source": [ + "# GLM in PyMC3: Out-Of-Sample Predictions\n", + "\n", + "In this notebook I explore the [glm](https://docs.pymc.io/api/glm.html) module of [PyMC3](https://docs.pymc.io/). I am particularly interested in the model definition using [patsy](https://patsy.readthedocs.io/en/latest/) formulas, as it makes the model evaluation loop faster (easier to include features and/or interactions). There are many good resources on this subject, but most of them evaluate the model in-sample. For many applications we require doing predictions on out-of-sample data. This experiment was motivated by the discussion of the thread [\"Out of sample\" predictions with the GLM sub-module](https://discourse.pymc.io/t/out-of-sample-predictions-with-the-glm-sub-module/773) on the (great!) forum [discourse.pymc.io/](https://discourse.pymc.io/), thank you all for your input!\n", + "\n", + "**Resources**\n", + "\n", + "\n", + "- [PyMC3 Docs: Example Notebooks](https://docs.pymc.io/nb_examples/index.html)\n", + " \n", + " - In particular check [GLM: Logistic Regression](https://docs.pymc.io/notebooks/GLM-logistic.html)\n", + "\n", + "- [Bambi](https://bambinos.github.io/bambi/), a more complete implementation of the GLM submodule which also allows for mixed-effects models.\n", + "\n", + "- [Bayesian Analysis with Python (Second edition) - Chapter 4](https://github.com/aloctavodia/BAP/blob/master/code/Chp4/04_Generalizing_linear_models.ipynb)\n", + "- [Statistical Rethinking](https://xcelab.net/rm/statistical-rethinking/)" + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "source": [ + "## Prepare Notebook" + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.\n" + ] + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "\n", + "sns.set_style(style=\"darkgrid\", rc={\"axes.facecolor\": \".9\", \"grid.color\": \".8\"})\n", + "sns.set_palette(palette=\"deep\")\n", + "sns_c = sns.color_palette(palette=\"deep\")\n", + "\n", + "import arviz as az\n", + "import patsy\n", + "import pymc3 as pm\n", + "\n", + "from pymc3 import glm\n", + "\n", + "plt.rcParams[\"figure.figsize\"] = [7, 6]\n", + "plt.rcParams[\"figure.dpi\"] = 100" + ] + }, + { + "source": [ + "## Generate Sample Data\n", + "\n", + "We want to fit a logistic regression model where there is a multiplicative interaction between two numerical features." + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " x1 x2 y\n", + "0 0.993428 -2.521768 0\n", + "1 -0.276529 1.835724 0\n", + "2 1.295377 4.244312 1\n", + "3 3.046060 2.064931 1\n", + "4 -0.468307 -3.038740 1" + ], + "text/html": "
\n | x1 | \nx2 | \ny | \n
---|---|---|---|
0 | \n0.993428 | \n-2.521768 | \n0 | \n
1 | \n-0.276529 | \n1.835724 | \n0 | \n
2 | \n1.295377 | \n4.244312 | \n1 | \n
3 | \n3.046060 | \n2.064931 | \n1 | \n
4 | \n-0.468307 | \n-3.038740 | \n1 | \n