You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/contributing/jupyter_style.md
+109Lines changed: 109 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,115 @@ the repository where the notebook is in (pymc or pymc-examples).
28
28
29
29
* When using non meaningful names such as single letters, add bullet points with a 1-2 sentence description of each variable below the equation where they are first introduced.
30
30
31
+
Choosing variable names can sometimes be difficult, tedious or annoying.
32
+
In case it helps, the dropdown below has some suggestions so you can focus on writing the actual content
33
+
34
+
::::::{dropdown} Variable name suggestions
35
+
:icon: light-bulb
36
+
37
+
**Models and sampling results**
38
+
* Use `idata` for sampling results, always containing a variable of type InferenceData
39
+
* Store inferecedata groups as variables to ease writing and reading of code operating on sampling results.
40
+
Use underscore separated 3-5 word abbreviations or the group name. Some examples of `abbrebiation`/`group_name`:
41
+
`post`/`posterior`, `const`/`constant_data`, `post_pred`/`posterior_predictive` or `obs_data`/`observed_data`
42
+
* For stats and diagnostics, use the ArviZ function name as variable name: `ess = az.ess(...)`, `loo = az.loo(...)`
43
+
* If there are multiple models in a notebook, assign a prefix to each model,
44
+
and use it throughout to identify which variables map to each model.
45
+
Taking the famous eight school as example, with a `centered` and `non_centered` model
46
+
to compare parametrizations, use `centered_model` (pm.Model object), `centered_idata`, `centered_post`, `centered_ess`... and `non_centered_model`, `non_centered_idata`...
47
+
48
+
**Dimension and random variable names**
49
+
* Use singular dimension names, following ArviZ `chain` and `draw`.
50
+
For example `sample`, `cluster`, `axis`, `component`, `forest`, `time`...
51
+
* If you can't think of a meaningful name for the dimension representing the number of observations such as time, fall back to `obs_id`
52
+
* For matrix dimensions, as xarray doesn't allow repeated dimension names, add a `_bis` suffix. i.e. `param, param_bis`
53
+
* For the dimension resulting from stacking `chain` and `draw` use `sample`, that is `.stack(sample=("chain", "draw"))`
54
+
* We often need to encode a categorical variable as integers. add `_idx` to the name of the variable it's encoding.
55
+
i.e. from `floor` and `county` to `floor_idx` and `county_idx`.
56
+
* To avoid clashes and overwriting variables when using `pm.Data`, use the following pattern:
57
+
58
+
```
59
+
x = np.array(...)
60
+
with pm.Model():
61
+
x_ = pm.Data("x", x)
62
+
...
63
+
```
64
+
65
+
This avoids overwriting the original `x` while having `idata.constant_data["x"]`,
66
+
and within the model `x_` is still available to play the role of `x`.
67
+
Otherwise, always try to use the same variable name as the string name given to the PyMC random variable.
68
+
69
+
**Plotting**
70
+
* Matplotlib figures and axes. Use:
71
+
*`fig` for matplotlib figures
72
+
*`ax` for a single matplotib axes object
73
+
*`axs` for arrays of matplotlib axes objects
74
+
75
+
When manually working with multiple matplotlib axes, use local `ax` variables:
76
+
77
+
::::{tab-set}
78
+
:::{tab-item} Local `ax` variables
79
+
```
80
+
fig, axs = pyplot.subplots()
81
+
82
+
ax = axs[0, 1]
83
+
ax.plot(...)
84
+
ax.set(...)
85
+
86
+
ax = axs[1, 2]
87
+
ax.scatter(...)
88
+
```
89
+
:::
90
+
:::{tab-item} Instead of subsetting every time
91
+
```
92
+
fig, axs = pyplot.subplots()
93
+
94
+
axs[0, 1].plot(...)
95
+
axs[0, 1].set(...)
96
+
97
+
axs[1. 2].scatter(...)
98
+
```
99
+
:::
100
+
::::
101
+
102
+
This makes editing the code if restructuring the subplots easier, only one change per subplot
103
+
is needed instead of one change per matplotlib function call.
104
+
105
+
* It is often useful to make a numpy linspace into an {class}`~xarray.DataArray`
106
+
for xarray to handle aligning and broadcasing automatically and ease computation.
107
+
* If a dimension name is needed, use `x_plot`
108
+
* If a variable name is needed for the original array and DataArray to coexist, add `_da` suffix
109
+
110
+
Thus, ending up with code like:
111
+
112
+
```
113
+
x = xr.DataArray(np.linspace(0, 10, 100), dims=["x_plot"])
114
+
# or
115
+
x = np.linspace(0, 10, 100)
116
+
x_da = xr.DataArray(x)
117
+
```
118
+
119
+
**Looping**
120
+
* When using enumerate, take the first letter of the variable as the count:
121
+
122
+
```
123
+
for p, person in enumerate(persons)
124
+
```
125
+
126
+
* When looping, if you need to store a variable after subsetting with the loop index,
127
+
append the index variable used for looping to the original variable name:
128
+
129
+
```
130
+
variable = np.array(...)
131
+
x = np.array(...)
132
+
for i in range(N):
133
+
variable_i = variable[i]
134
+
for j in range(K):
135
+
x_j = x[j]
136
+
...
137
+
```
138
+
139
+
::::::
31
140
32
141
## First cell
33
142
The first cell of all example notebooks should have a MyST target, a level 1 markdown title (that is a title with a single `#`) followed by the post directive.
0 commit comments