Update GLM robust #205

chiral-carbon · 2021-08-06T17:25:26Z

Addresses issue #88 and aims to advance it to:

best practices
use numpy.random.default_rng()
use bambi

review-notebook-app · 2021-08-06T17:25:30Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

chiral-carbon · 2021-08-06T17:27:51Z

examples/generalized_linear_models/GLM-robust.ipynb

@@ -16,7 +16,7 @@
    "\n",


the StudentT distribution is currently available with the development version of bambi so I thought I should wait till the next stable release and will update this again.

Reply via ReviewNB

same doubt about using arviz to plot instead of plot_posterior_predictive_glm() and get rid of the DeprecationWarning as mentioned before.

The same code used above should work here.

model_robust = bmb.Model("y ~ x", data, family="t") idata_robust = model_robust.fit(draws=2000, cores=2)
model_robust.predict(idata_robust, data=new_data)

posterior_y_mean = idata_robust.posterior["y_mean"].values.reshape((2 * 2000, 100))

Create the plot

plt.figure(figsize=(7, 5))

The [:, :100] selects the 100 samples in your original plot

plt.plot(new_data["x"], posterior_y_mean.T[:, :100], color="k", alpha=0.3, lw=0.5)

plt.plot(x_out, y_out, "x", label="data")

plt.plot(x, true_regression_line, label="true regression line", lw=3.0, c="y")

plt.legend(loc=0);

I think this is also a good instance to show how to override default priors. You could do something like the following, and then as usual

# Use a Gamma prior for the degrees of freedom model_robust = bmb.Model("y ~ x", data, family="t") model_robust.set_priors({"nu": bmb.Prior("Gamma", alpha=3, beta=1)}) model_robust

MarcoGorelli · 2021-08-13T10:00:28Z

Hi @chiral-carbon - shall we mark this as "draft" until StudentT is available in bambi?

chiral-carbon · 2021-08-13T13:10:38Z

@MarcoGorelli yeah sure. actually, PRs #204 and #196 will also be updated with the new release of bambi, so I suppose they can be marked as drafts as well.

tomicapretto · 2021-08-20T12:05:56Z

Hey @MarcoGorelli @chiral-carbon, the new release of Bambi is available.

Just ping me if you need help with any Bambi code 😄

chiral-carbon · 2021-08-20T16:52:22Z

thanks for letting us know @tomicapretto! will be sure to ping you if I face problems 😄

chiral-carbon · 2021-08-21T09:06:56Z

examples/generalized_linear_models/GLM-robust.ipynb

@@ -16,7 +16,7 @@
    "\n",


wanted to convert
pm.plot_posterior_predictive_glm(trace, samples=100, label="posterior predictive regression lines")
to use ArviZ and get rid of the DeprecationWarning. I figured I could use az.plot_ppc(trace); after calling model.predict(), but that ends up creating two different figures/plots, so how do I work around that?

Reply via ReviewNB

First some context: this plot is the posterior distribution of the regression lines. What do these regression lines represent? The mean value of the conditional distribution of Y given a certain value of X. So what we actually have above is not the posterior predictive distribution, because the posterior predictive distribution is not in the scale of the mean, but on the scale of the observed variable Y (I come back to this in the next comment).

So, what we need to get this posterior distribution of the mean in Bambi is the following

# Simply a grid going from 0 to 1 # This isn't strictly necessary, but will be helpful # since values of x are already sorted new_data = pd.DataFrame({"x": np.linspace(0, 1, 100)})
Computes the posterior regression lines, and stores the values in posterior["y_mean"]

model.predict(idata, data=new_data)

Stack chains (2 in my computer)

posterior_y_mean = idata.posterior["y_mean"].values.reshape((2 * 2000, 100))

# Create the plot
plt.figure(figsize=(7, 5))

The [:, :100] selects the 100 samples in your original plot

plt.plot(new_data["x"], posterior_y_mean.T[:, :100], color="k", alpha=0.3, lw=0.5)

plt.plot(x_out, y_out, "x", label="data")

plt.plot(x, true_regression_line, label="true regression line", lw=3.0, c="y")

plt.legend(loc=0);

Unfortunately it is not as easy as using pm.plot_posterior_predictive_glm yet... Hopefully, this kind of plot are going to be built-in Bambi in the future.

On the other hand, if you want the posterior predictive distribution, you do

model.predict(idata, kind="pps")
az.plot_ppc(idata)

This overlays the observed distribution of the data with the posterior predictive distribution. Ideally, the posterior predictive samples should not deviate much from the shape of the observed data.

thanks a lot for the explanation, will use this code.
model.predict(idata, kind="pps")
az.plot_ppc(idata)
is what I was trying to do without realizing that we are not trying to plot the posterior predictive distribution.

This same kind of spaghetti plot can already be generated with ArviZ, see https://arviz-devs.github.io/arviz/api/generated/arviz.plot_lm.html and https://utkarsh-maheshwari.medium.com/how-to-use-plot-lm-part-i-1f1cb099503e. The function is already available in ArviZ dev version which should be released soon

tomicapretto · 2021-08-21T13:53:22Z

examples/generalized_linear_models/GLM-robust.ipynb

@@ -16,7 +16,7 @@
    "\n",


With Bambi 0.6.0, it's not necessary to use a customFamily anymore. You can just call family="t"

Reply via ReviewNB

tomicapretto · 2021-08-21T13:53:22Z

examples/generalized_linear_models/GLM-robust.ipynb

@@ -16,7 +16,7 @@
    "\n",


Note that here you're not passing a custom family. You're telling Bambi to use the family named "t", which is a built-in family since version 0.6.0

Custom families are things created with Family() class

Reply via ReviewNB

got it, thanks for clarifying that

OriolAbril

We can use arviz.plot_lm in next iteration. Everything else looks good

MarcoGorelli

Update GLM robust

fc1371b

chiral-carbon added the outreachy2021 label Aug 6, 2021

chiral-carbon commented Aug 6, 2021

View reviewed changes

MarcoGorelli marked this pull request as draft August 13, 2021 13:25

update: student t distribution

c648508

chiral-carbon marked this pull request as ready for review August 21, 2021 08:49

chiral-carbon commented Aug 21, 2021

View reviewed changes

tomicapretto reviewed Aug 21, 2021

View reviewed changes

minor changes

1a4fb17

OriolAbril approved these changes Aug 26, 2021

View reviewed changes

MarcoGorelli approved these changes Aug 26, 2021

View reviewed changes

OriolAbril merged commit ea5f7a1 into pymc-devs:main Aug 26, 2021

chiral-carbon deleted the robust branch August 27, 2021 14:14

Uh oh!

Update GLM robust #205

Update GLM robust #205

Uh oh!

Conversation

chiral-carbon commented Aug 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 6, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Create the plot

The [:, :100] selects the 100 samples in your original plot

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli commented Aug 13, 2021

Uh oh!

chiral-carbon commented Aug 13, 2021

Uh oh!

tomicapretto commented Aug 20, 2021

Uh oh!

chiral-carbon commented Aug 20, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Computes the posterior regression lines, and stores the values in posterior["y_mean"]

Stack chains (2 in my computer)

The [:, :100] selects the 100 samples in your original plot

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OriolAbril left a comment

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chiral-carbon commented Aug 6, 2021 •

edited

Loading