Skip to content

Commit 006bd40

Browse files
committed
add variable name suggestions
1 parent 72e9b90 commit 006bd40

File tree

1 file changed

+109
-0
lines changed

1 file changed

+109
-0
lines changed

docs/source/contributing/jupyter_style.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,115 @@ the repository where the notebook is in (pymc or pymc-examples).
2828

2929
* When using non meaningful names such as single letters, add bullet points with a 1-2 sentence description of each variable below the equation where they are first introduced.
3030

31+
Choosing variable names can sometimes be difficult, tedious or annoying.
32+
In case it helps, the dropdown below has some suggestions so you can focus on writing the actual content
33+
34+
::::::{dropdown} Variable name suggestions
35+
:icon: light-bulb
36+
37+
**Models and sampling results**
38+
* Use `idata` for sampling results, always containing a variable of type InferenceData
39+
* Store inferecedata groups as variables to ease writing and reading of code operating on sampling results.
40+
Use underscore separated 3-5 word abbreviations or the group name. Some examples of `abbrebiation`/`group_name`:
41+
`post`/`posterior`, `const`/`constant_data`, `post_pred`/`posterior_predictive` or `obs_data`/`observed_data`
42+
* For stats and diagnostics, use the ArviZ function name as variable name: `ess = az.ess(...)`, `loo = az.loo(...)`
43+
* If there are multiple models in a notebook, assign a prefix to each model,
44+
and use it throughout to identify which variables map to each model.
45+
Taking the famous eight school as example, with a `centered` and `non_centered` model
46+
to compare parametrizations, use `centered_model` (pm.Model object), `centered_idata`, `centered_post`, `centered_ess`... and `non_centered_model`, `non_centered_idata`...
47+
48+
**Dimension and random variable names**
49+
* Use singular dimension names, following ArviZ `chain` and `draw`.
50+
For example `sample`, `cluster`, `axis`, `component`, `forest`, `time`...
51+
* If you can't think of a meaningful name for the dimension representing the number of observations such as time, fall back to `obs_id`
52+
* For matrix dimensions, as xarray doesn't allow repeated dimension names, add a `_bis` suffix. i.e. `param, param_bis`
53+
* For the dimension resulting from stacking `chain` and `draw` use `sample`, that is `.stack(sample=("chain", "draw"))`
54+
* We often need to encode a categorical variable as integers. add `_idx` to the name of the variable it's encoding.
55+
i.e. from `floor` and `county` to `floor_idx` and `county_idx`.
56+
* To avoid clashes and overwriting variables when using `pm.Data`, use the following pattern:
57+
58+
```
59+
x = np.array(...)
60+
with pm.Model():
61+
x_ = pm.Data("x", x)
62+
...
63+
```
64+
65+
This avoids overwriting the original `x` while having `idata.constant_data["x"]`,
66+
and within the model `x_` is still available to play the role of `x`.
67+
Otherwise, always try to use the same variable name as the string name given to the PyMC random variable.
68+
69+
**Plotting**
70+
* Matplotlib figures and axes. Use:
71+
* `fig` for matplotlib figures
72+
* `ax` for a single matplotib axes object
73+
* `axs` for arrays of matplotlib axes objects
74+
75+
When manually working with multiple matplotlib axes, use local `ax` variables:
76+
77+
::::{tab-set}
78+
:::{tab-item} Local `ax` variables
79+
```
80+
fig, axs = pyplot.subplots()
81+
82+
ax = axs[0, 1]
83+
ax.plot(...)
84+
ax.set(...)
85+
86+
ax = axs[1, 2]
87+
ax.scatter(...)
88+
```
89+
:::
90+
:::{tab-item} Instead of subsetting every time
91+
```
92+
fig, axs = pyplot.subplots()
93+
94+
axs[0, 1].plot(...)
95+
axs[0, 1].set(...)
96+
97+
axs[1. 2].scatter(...)
98+
```
99+
:::
100+
::::
101+
102+
This makes editing the code if restructuring the subplots easier, only one change per subplot
103+
is needed instead of one change per matplotlib function call.
104+
105+
* It is often useful to make a numpy linspace into an {class}`~xarray.DataArray`
106+
for xarray to handle aligning and broadcasing automatically and ease computation.
107+
* If a dimension name is needed, use `x_plot`
108+
* If a variable name is needed for the original array and DataArray to coexist, add `_da` suffix
109+
110+
Thus, ending up with code like:
111+
112+
```
113+
x = xr.DataArray(np.linspace(0, 10, 100), dims=["x_plot"])
114+
# or
115+
x = np.linspace(0, 10, 100)
116+
x_da = xr.DataArray(x)
117+
```
118+
119+
**Looping**
120+
* When using enumerate, take the first letter of the variable as the count:
121+
122+
```
123+
for p, person in enumerate(persons)
124+
```
125+
126+
* When looping, if you need to store a variable after subsetting with the loop index,
127+
append the index variable used for looping to the original variable name:
128+
129+
```
130+
variable = np.array(...)
131+
x = np.array(...)
132+
for i in range(N):
133+
variable_i = variable[i]
134+
for j in range(K):
135+
x_j = x[j]
136+
...
137+
```
138+
139+
::::::
31140

32141
## First cell
33142
The first cell of all example notebooks should have a MyST target, a level 1 markdown title (that is a title with a single `#`) followed by the post directive.

0 commit comments

Comments
 (0)