-
-
Notifications
You must be signed in to change notification settings - Fork 272
Update GLM robust #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update GLM robust #205
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -16,7 +16,7 @@ | |||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the StudentT distribution is currently available with the development version of bambi so I thought I should wait till the next stable release and will update this again.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same doubt about using arviz to plot instead of plot_posterior_predictive_glm()
and get rid of the DeprecationWarning as mentioned before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same code used above should work here.
model_robust = bmb.Model("y ~ x", data, family="t") idata_robust = model_robust.fit(draws=2000, cores=2)model_robust.predict(idata_robust, data=new_data)
posterior_y_mean = idata_robust.posterior["y_mean"].values.reshape((2 * 2000, 100))
Create the plot
plt.figure(figsize=(7, 5))
The [:, :100] selects the 100 samples in your original plot
plt.plot(new_data["x"], posterior_y_mean.T[:, :100], color="k", alpha=0.3, lw=0.5)
plt.plot(x_out, y_out, "x", label="data")
plt.plot(x, true_regression_line, label="true regression line", lw=3.0, c="y")
plt.legend(loc=0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also a good instance to show how to override default priors. You could do something like the following, and then as usual
# Use a Gamma prior for the degrees of freedom model_robust = bmb.Model("y ~ x", data, family="t") model_robust.set_priors({"nu": bmb.Prior("Gamma", alpha=3, beta=1)}) model_robust
Hi @chiral-carbon - shall we mark this as "draft" until StudentT is available in bambi? |
@MarcoGorelli yeah sure. actually, PRs #204 and #196 will also be updated with the new release of bambi, so I suppose they can be marked as drafts as well. |
Hey @MarcoGorelli @chiral-carbon, the new release of Bambi is available. Just ping me if you need help with any Bambi code 😄 |
thanks for letting us know @tomicapretto! will be sure to ping you if I face problems 😄 |
@@ -16,7 +16,7 @@ | |||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wanted to convert
pm.plot_posterior_predictive_glm(trace, samples=100, label="posterior predictive regression lines")
to use ArviZ and get rid of the DeprecationWarning. I figured I could use az.plot_ppc(trace);
after calling model.predict()
, but that ends up creating two different figures/plots, so how do I work around that?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First some context: this plot is the posterior distribution of the regression lines. What do these regression lines represent? The mean value of the conditional distribution of Y given a certain value of X. So what we actually have above is not the posterior predictive distribution, because the posterior predictive distribution is not in the scale of the mean, but on the scale of the observed variable Y (I come back to this in the next comment).
So, what we need to get this posterior distribution of the mean in Bambi is the following
# Simply a grid going from 0 to 1 # This isn't strictly necessary, but will be helpful # since values of x are already sorted new_data = pd.DataFrame({"x": np.linspace(0, 1, 100)})Computes the posterior regression lines, and stores the values in posterior["y_mean"]
model.predict(idata, data=new_data)
Stack chains (2 in my computer)
posterior_y_mean = idata.posterior["y_mean"].values.reshape((2 * 2000, 100))
# Create the plot
plt.figure(figsize=(7, 5))The [:, :100] selects the 100 samples in your original plot
plt.plot(new_data["x"], posterior_y_mean.T[:, :100], color="k", alpha=0.3, lw=0.5)
plt.plot(x_out, y_out, "x", label="data")
plt.plot(x, true_regression_line, label="true regression line", lw=3.0, c="y")
plt.legend(loc=0);
Unfortunately it is not as easy as using pm.plot_posterior_predictive_glm
yet... Hopefully, this kind of plot are going to be built-in Bambi in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, if you want the posterior predictive distribution, you do
model.predict(idata, kind="pps")
az.plot_ppc(idata)
This overlays the observed distribution of the data with the posterior predictive distribution. Ideally, the posterior predictive samples should not deviate much from the shape of the observed data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for the explanation, will use this code.
model.predict(idata, kind="pps")
az.plot_ppc(idata)
is what I was trying to do without realizing that we are not trying to plot the posterior predictive distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This same kind of spaghetti plot can already be generated with ArviZ, see https://arviz-devs.github.io/arviz/api/generated/arviz.plot_lm.html and https://utkarsh-maheshwari.medium.com/how-to-use-plot-lm-part-i-1f1cb099503e. The function is already available in ArviZ dev version which should be released soon
@@ -16,7 +16,7 @@ | |||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Bambi 0.6.0, it's not necessary to use a customFamily
anymore. You can just call family="t"
Reply via ReviewNB
@@ -16,7 +16,7 @@ | |||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that here you're not passing a custom family. You're telling Bambi to use the family named "t", which is a built-in family since version 0.6.0
Custom families are things created with Family()
class
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks for clarifying that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use arviz.plot_lm
in next iteration. Everything else looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addresses issue #88 and aims to advance it to:
numpy.random.default_rng()
bambi