You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -113,7 +113,7 @@ In the previous plot the white line is the median over 4000 posterior draws, and
113
113
The following figure shows two samples from the posterior of $\mu$. We can see that these functions are not smooth. This is fine and is a direct consequence of using regression trees. Trees can be seen as a way to represent stepwise functions, and a sum of stepwise functions is just another stepwise function. Thus, when using BART we just need to know that we are assuming that a stepwise function is a good enough approximation for our problem. In practice this is often the case because we sum over many trees, usually values like 50, 100 or 200. Additionally, we often average over the posterior distribution. All this makes the "steps smoother", even when we never really have an smooth function as for example with Gaussian processes (splines). A nice theoretical result, tells us that in the limit of $m \to \infty$ the BART prior converges to a [nowheredifferentiable](https://en.wikipedia.org/wiki/Weierstrass_function) Gaussian process.
To gain further intuition the next figures show 3 of the `m` trees. As we can see these are definitely not very good approximators by themselves. inspecting individuals trees is generally not necessary. We are just showing them here to generate intuition about BART.
@@ -143,7 +143,7 @@ Y = bikes["count"]
143
143
```{code-cell} ipython3
144
144
with pm.Model() as model_bikes:
145
145
σ = pm.HalfNormal("σ", Y.std())
146
-
μ = pmx.BART("μ", X, Y, m=50)
146
+
μ = pmb.BART("μ", X, Y, m=50)
147
147
y = pm.Normal("y", μ, σ, observed=Y)
148
148
idata_bikes = pm.sample(random_seed=RANDOM_SEED)
149
149
```
@@ -155,7 +155,7 @@ with pm.Model() as model_bikes:
155
155
To help us interpret the results of our model we are going to use partial dependence plot. This is a type of plot that shows the marginal effect that one covariate has on the predicted variable. That is, what is the effect that a covariate $X_i$ has of $Y$ while we average over all the other covariates ($X_j, \forall j \not = i$). This type of plot are not exclusive of BART. But they are often used in the BART literature. PyMC provides an utility function to make this plot from the inference data.
From this plot we can see the main effect of each covariate on the predicted value. This is very useful we can recover complex relationship beyond monotonic increasing or decreasing effects. For example for the `hour` covariate we can see two peaks around 8 and and 17 hs and a minimum at midnight.
@@ -176,12 +176,13 @@ Additionally, we provide a novel method to assess the variable importance. You c
0 commit comments