From 72e9b905c8cd959c5204227895720bc35e52bd3c Mon Sep 17 00:00:00 2001 From: "Oriol (ZBook)" Date: Thu, 16 Dec 2021 21:30:36 +0200 Subject: [PATCH 1/6] add extra dependencies guidance to jupyter style doc --- docs/source/contributing/jupyter_style.md | 121 +++++++++++++++++++++- 1 file changed, 119 insertions(+), 2 deletions(-) diff --git a/docs/source/contributing/jupyter_style.md b/docs/source/contributing/jupyter_style.md index b3f6c34051..51babada2e 100644 --- a/docs/source/contributing/jupyter_style.md +++ b/docs/source/contributing/jupyter_style.md @@ -58,6 +58,123 @@ Choose a category from [existing categories](https://github.com/pymc-devs/pymc/w Authors should list people who authored, adapted or updated the notebook. See {ref}`jupyter_authors` for more details. +## Extra dependencies +If the notebook uses libraries that are not PyMC dependencies, these extra dependencies should +be indicated together with some advise on how to install them. +This ensures readers know what they'll need to install beforehand and can for example +decide between running it locally or on binder. + +To make things easier for notebook writers and maintainers, pymc-examples contains +a template for this that warns about the extra dependencies and provides specific +installation instructions inside a dropdown. + +Thus, notebooks with extra dependencies should: + +1. list the extra dependencies as notebook metadata using the `myst_substitutions` category + and then either the `extra_dependencies` or the `pip_dependencies` and `conda_dependencies`. + In addition, there is also an `extra_install_notes` to include custom text inside the dropdown. + + * notebook metadata can be edited from the menubar `Edit` -> `Edit notebook metadata` + in the dropdown + + This will open a window with json formatted text that might look a bit like: + + ::::{tab-set} + :::{tab-item} No myst_substitutions + + ```json + { + "kernelspec": { + "name": "python3", + "display_name": "Python 3 (ipykernel)", + "language": "python" + }, + "language_info": { + "name": "python", + "version": "3.9.7", + "mimetype": "text/x-python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "file_extension": ".py" + } + } + ``` + ::: + + :::{tab-item} extra_dependencies key + + ```{code-block} json + :emphasize-lines: 19-21 + { + "kernelspec": { + "name": "python3", + "display_name": "Python 3 (ipykernel)", + "language": "python" + }, + "language_info": { + "name": "python", + "version": "3.9.7", + "mimetype": "text/x-python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "file_extension": ".py" + }, + "myst_substitutions": { + "extra_dependencies": "bambi seaborn" + } + } + ``` + ::: + + :::{tab-item} pip and conda specific keys + ```{code-block} json + :emphasize-lines: 19-22 + { + "kernelspec": { + "name": "python3", + "display_name": "Python 3 (ipykernel)", + "language": "python" + }, + "language_info": { + "name": "python", + "version": "3.9.7", + "mimetype": "text/x-python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "file_extension": ".py" + }, + "myst_substitutions": { + "pip_dependencies": "graphviz", + "conda_dependencies": "python-graphviz", + } + } + ``` + + The pip and conda spcific keys overwrite the `extra_installs` one, so it doesn't make + sense to use `extra_installs` is using them. Either both pip and conda substitutions + are defined or none of them is. + ::: + :::: + +1. include the warning and installation advise template with the following markdown: + + ```markdown + :::{include} ../extra_installs.md + ::: + ``` + ## Code preamble In a cell just below the cell where you imported matplotlib and/or ArviZ (usually the first one), @@ -185,7 +302,7 @@ References can be cited twice within a single notebook. Two common reference for which can be added inline, within the text itself. At the end of the notebook, add the bibliography with the following markdown -``` +```markdown ## References :::{bibliography} @@ -195,7 +312,7 @@ which can be added inline, within the text itself. At the end of the notebook, a or alternatively, if you wanted to add extra references that have not been cited within the text, use: -``` +```markdown ## References :::{bibliography} From 006bd409d6a670eadfd50e895a7618751c40fe37 Mon Sep 17 00:00:00 2001 From: "Oriol (ZBook)" Date: Sat, 18 Dec 2021 19:16:39 +0200 Subject: [PATCH 2/6] add variable name suggestions --- docs/source/contributing/jupyter_style.md | 109 ++++++++++++++++++++++ 1 file changed, 109 insertions(+) diff --git a/docs/source/contributing/jupyter_style.md b/docs/source/contributing/jupyter_style.md index 51babada2e..531d25128c 100644 --- a/docs/source/contributing/jupyter_style.md +++ b/docs/source/contributing/jupyter_style.md @@ -28,6 +28,115 @@ the repository where the notebook is in (pymc or pymc-examples). * When using non meaningful names such as single letters, add bullet points with a 1-2 sentence description of each variable below the equation where they are first introduced. +Choosing variable names can sometimes be difficult, tedious or annoying. +In case it helps, the dropdown below has some suggestions so you can focus on writing the actual content + +::::::{dropdown} Variable name suggestions +:icon: light-bulb + +**Models and sampling results** +* Use `idata` for sampling results, always containing a variable of type InferenceData +* Store inferecedata groups as variables to ease writing and reading of code operating on sampling results. + Use underscore separated 3-5 word abbreviations or the group name. Some examples of `abbrebiation`/`group_name`: + `post`/`posterior`, `const`/`constant_data`, `post_pred`/`posterior_predictive` or `obs_data`/`observed_data` +* For stats and diagnostics, use the ArviZ function name as variable name: `ess = az.ess(...)`, `loo = az.loo(...)` +* If there are multiple models in a notebook, assign a prefix to each model, + and use it throughout to identify which variables map to each model. + Taking the famous eight school as example, with a `centered` and `non_centered` model + to compare parametrizations, use `centered_model` (pm.Model object), `centered_idata`, `centered_post`, `centered_ess`... and `non_centered_model`, `non_centered_idata`... + +**Dimension and random variable names** +* Use singular dimension names, following ArviZ `chain` and `draw`. + For example `sample`, `cluster`, `axis`, `component`, `forest`, `time`... +* If you can't think of a meaningful name for the dimension representing the number of observations such as time, fall back to `obs_id` +* For matrix dimensions, as xarray doesn't allow repeated dimension names, add a `_bis` suffix. i.e. `param, param_bis` +* For the dimension resulting from stacking `chain` and `draw` use `sample`, that is `.stack(sample=("chain", "draw"))` +* We often need to encode a categorical variable as integers. add `_idx` to the name of the variable it's encoding. + i.e. from `floor` and `county` to `floor_idx` and `county_idx`. +* To avoid clashes and overwriting variables when using `pm.Data`, use the following pattern: + + ``` + x = np.array(...) + with pm.Model(): + x_ = pm.Data("x", x) + ... + ``` + + This avoids overwriting the original `x` while having `idata.constant_data["x"]`, + and within the model `x_` is still available to play the role of `x`. + Otherwise, always try to use the same variable name as the string name given to the PyMC random variable. + +**Plotting** +* Matplotlib figures and axes. Use: + * `fig` for matplotlib figures + * `ax` for a single matplotib axes object + * `axs` for arrays of matplotlib axes objects + + When manually working with multiple matplotlib axes, use local `ax` variables: + + ::::{tab-set} + :::{tab-item} Local `ax` variables + ``` + fig, axs = pyplot.subplots() + + ax = axs[0, 1] + ax.plot(...) + ax.set(...) + + ax = axs[1, 2] + ax.scatter(...) + ``` + ::: + :::{tab-item} Instead of subsetting every time + ``` + fig, axs = pyplot.subplots() + + axs[0, 1].plot(...) + axs[0, 1].set(...) + + axs[1. 2].scatter(...) + ``` + ::: + :::: + + This makes editing the code if restructuring the subplots easier, only one change per subplot + is needed instead of one change per matplotlib function call. + +* It is often useful to make a numpy linspace into an {class}`~xarray.DataArray` + for xarray to handle aligning and broadcasing automatically and ease computation. + * If a dimension name is needed, use `x_plot` + * If a variable name is needed for the original array and DataArray to coexist, add `_da` suffix + + Thus, ending up with code like: + + ``` + x = xr.DataArray(np.linspace(0, 10, 100), dims=["x_plot"]) + # or + x = np.linspace(0, 10, 100) + x_da = xr.DataArray(x) + ``` + +**Looping** +* When using enumerate, take the first letter of the variable as the count: + + ``` + for p, person in enumerate(persons) + ``` + +* When looping, if you need to store a variable after subsetting with the loop index, + append the index variable used for looping to the original variable name: + + ``` + variable = np.array(...) + x = np.array(...) + for i in range(N): + variable_i = variable[i] + for j in range(K): + x_j = x[j] + ... + ``` + +:::::: ## First cell The first cell of all example notebooks should have a MyST target, a level 1 markdown title (that is a title with a single `#`) followed by the post directive. From 87294c1aeb057c8ff765874bf35bc663b0b82585 Mon Sep 17 00:00:00 2001 From: "Oriol (ZBook)" Date: Sat, 18 Dec 2021 19:17:40 +0200 Subject: [PATCH 3/6] fix typo --- docs/source/contributing/jupyter_style.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/contributing/jupyter_style.md b/docs/source/contributing/jupyter_style.md index 531d25128c..95215b7707 100644 --- a/docs/source/contributing/jupyter_style.md +++ b/docs/source/contributing/jupyter_style.md @@ -272,7 +272,7 @@ Thus, notebooks with extra dependencies should: ``` The pip and conda spcific keys overwrite the `extra_installs` one, so it doesn't make - sense to use `extra_installs` is using them. Either both pip and conda substitutions + sense to use `extra_installs` if using them. Either both pip and conda substitutions are defined or none of them is. ::: :::: From fab2b66094892a7335b8ba25520b17fdabc8c29b Mon Sep 17 00:00:00 2001 From: "Oriol (ZBook)" Date: Sat, 18 Dec 2021 19:33:09 +0200 Subject: [PATCH 4/6] extra tweaks --- docs/source/contributing/jupyter_style.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/docs/source/contributing/jupyter_style.md b/docs/source/contributing/jupyter_style.md index 95215b7707..1477b99776 100644 --- a/docs/source/contributing/jupyter_style.md +++ b/docs/source/contributing/jupyter_style.md @@ -31,7 +31,7 @@ the repository where the notebook is in (pymc or pymc-examples). Choosing variable names can sometimes be difficult, tedious or annoying. In case it helps, the dropdown below has some suggestions so you can focus on writing the actual content -::::::{dropdown} Variable name suggestions +:::::::{dropdown} Variable name suggestions :icon: light-bulb **Models and sampling results** @@ -76,7 +76,8 @@ In case it helps, the dropdown below has some suggestions so you can focus on wr ::::{tab-set} :::{tab-item} Local `ax` variables - ``` + ```{code-block} python + :emphasize-lines: 3, 7 fig, axs = pyplot.subplots() ax = axs[0, 1] @@ -126,7 +127,8 @@ In case it helps, the dropdown below has some suggestions so you can focus on wr * When looping, if you need to store a variable after subsetting with the loop index, append the index variable used for looping to the original variable name: - ``` + ```{code-block} python + :emphasize-lines: 4, 6 variable = np.array(...) x = np.array(...) for i in range(N): @@ -136,7 +138,7 @@ In case it helps, the dropdown below has some suggestions so you can focus on wr ... ``` -:::::: +::::::: ## First cell The first cell of all example notebooks should have a MyST target, a level 1 markdown title (that is a title with a single `#`) followed by the post directive. @@ -437,12 +439,14 @@ Once you're finished with your NB, add a very last cell with [the watermark pack ```python %load_ext watermark -%watermark -n -u -v -iv -w -p theano,xarray +%watermark -n -u -v -iv -w -p aesara,xarray ``` This second to last code cell should be preceded by a markdown cell with the `## Watermark` title only so it appears in the table of contents. -`watermark` should be in your virtual environment if you installed our `requirements-dev.txt`. Otherwise, just run `pip install watermark`. The `p` flag is optional but should be added if Theano (or Aesara if in `v4`) or xarray are not imported explicitly. +`watermark` should be in your virtual environment if you installed our `requirements-dev.txt`. +Otherwise, just run `pip install watermark`. +The `p` flag is optional but should be added if Aesara or xarray are not imported explicitly. This will also be checked by `pre-commit` (because we all forget to do things sometimes 😳). ## Epilogue From 7f2bdc65417a51102ebd90b8948db128004fc86f Mon Sep 17 00:00:00 2001 From: "Oriol (ZBook)" Date: Sun, 19 Dec 2021 20:46:30 +0200 Subject: [PATCH 5/6] add note on myst --- docs/source/conf.py | 2 + docs/source/contributing/jupyter_style.md | 51 ++++++++++++++++++----- 2 files changed, 42 insertions(+), 11 deletions(-) diff --git a/docs/source/conf.py b/docs/source/conf.py index d4f5dcdf37..d458c616ff 100755 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -153,6 +153,8 @@ "aesara": ("https://aesara.readthedocs.io/en/latest/", None), "numpy": ("https://numpy.org/doc/stable/", None), "nb": ("https://pymc-examples.readthedocs.io/en/latest/", None), + "myst": ("https://myst-parser.readthedocs.io/en/latest", None), + "myst-nb": ("https://myst-nb.readthedocs.io/en/latest/", None), } diff --git a/docs/source/contributing/jupyter_style.md b/docs/source/contributing/jupyter_style.md index 1477b99776..9dbcceebd0 100644 --- a/docs/source/contributing/jupyter_style.md +++ b/docs/source/contributing/jupyter_style.md @@ -1,8 +1,13 @@ (jupyter_style)= # Jupyter Style Guide -These guidelines should be followed by all notebooks in the documentation, independently of -the repository where the notebook is in (pymc or pymc-examples). +These guidelines should be followed by notebooks in the documentation. +All notebooks in pymc-examples must follow this to the letter, the style +is more permissive for the ones on pymc where not everything is available. + +The documentation websites are generated by Sphinx, which uses +{doc}`myst:index` and {doc}`myst-nb:index` +to parse the notebooks. ## General guidelines @@ -10,12 +15,35 @@ the repository where the notebook is in (pymc or pymc-examples). * Explain the reasoning behind each step. -* Use the glossary whenever possible. If you use a term that is defined in the Glossary, link to it the first time that term appears in a significant manner. Use [this syntax](https://jupyterbook.org/content/content-blocks.html?highlight=glossary#glossaries) to add a term reference. [Link to glossary source](https://github.com/pymc-devs/pymc/blob/main/docs/source/glossary.md) where new terms should be added. - * Attribute quoted text or code, and link to relevant references. * Keep notebooks short: 20/30 cells for content aimed at beginners or intermediate users, longer notebooks are fine at the advanced level. +### MyST guidelines +Using MyST allows taking advantage of all sphinx features from markdown cells in the notebooks. +All markdown should be valid MyST (note that MyST is a superset of recommonmark). +This guide does not teach nor cover MyST extensively, only gives some opinionated guidelines. + +* **Never** use url links to refer to other notebooks, PyMC documentation or other python + libraries documentations. Use [sphinx cross-references](https://docs.readthedocs.io/en/stable/guides/cross-referencing-with-sphinx.html) + instead. + + :::{caution} + Using urls links breaks self referencing in versioned docs! And at the same time they are + less robust than sphinx cross-references. + ::: + + * When linking to other notebooks, always use a `ref` type cross-reference pointing + to the target in the {ref}`jupyter_style_first_cell`. + +* If the output (or even code and output) of a cell is not necessary to follow the + notebook or it is very long and can break the flow of reading, consider hiding + it with a {doc}`toggle button ` + +* Consider using {ref}`myst:syntax/figures` to add captions to images used in the notebook. + +* Use the glossary whenever possible. If you use a term that is defined in the Glossary, link to it the first time that term appears in a significant manner. Use [this syntax](https://jupyterbook.org/content/content-blocks.html?highlight=glossary#glossaries) to add a term reference. [Link to glossary source](https://github.com/pymc-devs/pymc/blob/main/docs/source/glossary.md) where new terms should be added. + ### Variable names * Above all, stay consistent with variable names within the notebook. Notebooks using multiple names for the same variable will not be merged. @@ -35,7 +63,7 @@ In case it helps, the dropdown below has some suggestions so you can focus on wr :icon: light-bulb **Models and sampling results** -* Use `idata` for sampling results, always containing a variable of type InferenceData +* Use `idata` for sampling results, always containing a variable of type InferenceData. * Store inferecedata groups as variables to ease writing and reading of code operating on sampling results. Use underscore separated 3-5 word abbreviations or the group name. Some examples of `abbrebiation`/`group_name`: `post`/`posterior`, `const`/`constant_data`, `post_pred`/`posterior_predictive` or `obs_data`/`observed_data` @@ -47,10 +75,10 @@ In case it helps, the dropdown below has some suggestions so you can focus on wr **Dimension and random variable names** * Use singular dimension names, following ArviZ `chain` and `draw`. - For example `sample`, `cluster`, `axis`, `component`, `forest`, `time`... -* If you can't think of a meaningful name for the dimension representing the number of observations such as time, fall back to `obs_id` -* For matrix dimensions, as xarray doesn't allow repeated dimension names, add a `_bis` suffix. i.e. `param, param_bis` -* For the dimension resulting from stacking `chain` and `draw` use `sample`, that is `.stack(sample=("chain", "draw"))` + For example `cluster`, `axis`, `component`, `forest`, `time`... +* If you can't think of a meaningful name for the dimension representing the number of observations such as time, fall back to `obs_id`. +* For matrix dimensions, as xarray doesn't allow repeated dimension names, add a `_bis` suffix. i.e. `param, param_bis`. +* For the dimension resulting from stacking `chain` and `draw` use `sample`, that is `.stack(sample=("chain", "draw"))`. * We often need to encode a categorical variable as integers. add `_idx` to the name of the variable it's encoding. i.e. from `floor` and `county` to `floor_idx` and `county_idx`. * To avoid clashes and overwriting variables when using `pm.Data`, use the following pattern: @@ -140,6 +168,7 @@ In case it helps, the dropdown below has some suggestions so you can focus on wr ::::::: +(jupyter_style_first_cell)= ## First cell The first cell of all example notebooks should have a MyST target, a level 1 markdown title (that is a title with a single `#`) followed by the post directive. The syntax is as follows: @@ -157,7 +186,7 @@ The syntax is as follows: The date should correspond to the latest update/execution date, at least roughly (it's not a problem if the date is a few days off due to the review process before merging the PR). This will allow users to see which notebooks have been updated lately and will help the PyMC team make sure no notebook is left outdated for too long. -The [MyST target](https://myst-parser.readthedocs.io/en/latest/syntax/syntax.html#targets-and-cross-referencing) +The {ref}`MyST target ` is important to ease referencing and linking notebooks between each other. Tags can be anything, but we ask you to try to use [existing tags](https://github.com/pymc-devs/pymc/wiki/Categories-and-Tags-for-PyMC-Examples) @@ -439,7 +468,7 @@ Once you're finished with your NB, add a very last cell with [the watermark pack ```python %load_ext watermark -%watermark -n -u -v -iv -w -p aesara,xarray +%watermark -n -u -v -iv -w -p aesara,aeppl,xarray ``` This second to last code cell should be preceded by a markdown cell with the `## Watermark` title only so it appears in the table of contents. From 20f778b6d4458ea4827bc9ca53375b90f90179af Mon Sep 17 00:00:00 2001 From: "Oriol (ZBook)" Date: Sun, 19 Dec 2021 20:49:55 +0200 Subject: [PATCH 6/6] add note on extra dependencies --- docs/source/contributing/jupyter_style.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source/contributing/jupyter_style.md b/docs/source/contributing/jupyter_style.md index 9dbcceebd0..fc0f6c9ef6 100644 --- a/docs/source/contributing/jupyter_style.md +++ b/docs/source/contributing/jupyter_style.md @@ -308,7 +308,8 @@ Thus, notebooks with extra dependencies should: ::: :::: -1. include the warning and installation advise template with the following markdown: +1. include the warning and installation advise template with the following markdown right before + the extra dependencies are imported: ```markdown :::{include} ../extra_installs.md