Skip to content

Update GLM robust #205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 26, 2021
Merged

Update GLM robust #205

merged 3 commits into from
Aug 26, 2021

Conversation

chiral-carbon
Copy link
Collaborator

@chiral-carbon chiral-carbon commented Aug 6, 2021

Addresses issue #88 and aims to advance it to:

  • best practices
  • use numpy.random.default_rng()
  • use bambi

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -16,7 +16,7 @@
"\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the StudentT distribution is currently available with the development version of bambi so I thought I should wait till the next stable release and will update this again.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same doubt about using arviz to plot instead of plot_posterior_predictive_glm() and get rid of the DeprecationWarning as mentioned before.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same code used above should work here.

model_robust = bmb.Model("y ~ x", data, family="t")
idata_robust = model_robust.fit(draws=2000, cores=2)

model_robust.predict(idata_robust, data=new_data)

posterior_y_mean = idata_robust.posterior["y_mean"].values.reshape((2 * 2000, 100))

Create the plot

plt.figure(figsize=(7, 5))

The [:, :100] selects the 100 samples in your original plot

plt.plot(new_data["x"], posterior_y_mean.T[:, :100], color="k", alpha=0.3, lw=0.5)

plt.plot(x_out, y_out, "x", label="data")

plt.plot(x, true_regression_line, label="true regression line", lw=3.0, c="y")

plt.legend(loc=0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also a good instance to show how to override default priors. You could do something like the following, and then as usual

# Use a Gamma prior for the degrees of freedom
model_robust = bmb.Model("y ~ x", data, family="t")
model_robust.set_priors({"nu": bmb.Prior("Gamma", alpha=3, beta=1)})
model_robust

@MarcoGorelli
Copy link
Contributor

Hi @chiral-carbon - shall we mark this as "draft" until StudentT is available in bambi?

@chiral-carbon
Copy link
Collaborator Author

@MarcoGorelli yeah sure. actually, PRs #204 and #196 will also be updated with the new release of bambi, so I suppose they can be marked as drafts as well.

@MarcoGorelli MarcoGorelli marked this pull request as draft August 13, 2021 13:25
@tomicapretto
Copy link

Hey @MarcoGorelli @chiral-carbon, the new release of Bambi is available.

Just ping me if you need help with any Bambi code 😄

@chiral-carbon
Copy link
Collaborator Author

thanks for letting us know @tomicapretto! will be sure to ping you if I face problems 😄

@chiral-carbon chiral-carbon marked this pull request as ready for review August 21, 2021 08:49
@@ -16,7 +16,7 @@
"\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanted to convert

pm.plot_posterior_predictive_glm(trace, samples=100, label="posterior predictive regression lines")

to use ArviZ and get rid of the DeprecationWarning. I figured I could use az.plot_ppc(trace); after calling model.predict(), but that ends up creating two different figures/plots, so how do I work around that?


Reply via ReviewNB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First some context: this plot is the posterior distribution of the regression lines. What do these regression lines represent? The mean value of the conditional distribution of Y given a certain value of X. So what we actually have above is not the posterior predictive distribution, because the posterior predictive distribution is not in the scale of the mean, but on the scale of the observed variable Y (I come back to this in the next comment).

So, what we need to get this posterior distribution of the mean in Bambi is the following

# Simply a grid going from 0 to 1
# This isn't strictly necessary, but will be helpful
# since values of x are already sorted
new_data = pd.DataFrame({"x": np.linspace(0, 1, 100)})

Computes the posterior regression lines, and stores the values in posterior["y_mean"]

model.predict(idata, data=new_data)

Stack chains (2 in my computer)

posterior_y_mean = idata.posterior["y_mean"].values.reshape((2 * 2000, 100))

# Create the plot
plt.figure(figsize=(7, 5))

The [:, :100] selects the 100 samples in your original plot

plt.plot(new_data["x"], posterior_y_mean.T[:, :100], color="k", alpha=0.3, lw=0.5)

plt.plot(x_out, y_out, "x", label="data")

plt.plot(x, true_regression_line, label="true regression line", lw=3.0, c="y")

plt.legend(loc=0);

Unfortunately it is not as easy as using pm.plot_posterior_predictive_glm yet... Hopefully, this kind of plot are going to be built-in Bambi in the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, if you want the posterior predictive distribution, you do

model.predict(idata, kind="pps")

az.plot_ppc(idata)

This overlays the observed distribution of the data with the posterior predictive distribution. Ideally, the posterior predictive samples should not deviate much from the shape of the observed data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for the explanation, will use this code.

model.predict(idata, kind="pps")

az.plot_ppc(idata)

is what I was trying to do without realizing that we are not trying to plot the posterior predictive distribution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same kind of spaghetti plot can already be generated with ArviZ, see https://arviz-devs.github.io/arviz/api/generated/arviz.plot_lm.html and https://utkarsh-maheshwari.medium.com/how-to-use-plot-lm-part-i-1f1cb099503e. The function is already available in ArviZ dev version which should be released soon

@@ -16,7 +16,7 @@
"\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Bambi 0.6.0, it's not necessary to use a customFamily anymore. You can just call family="t"


Reply via ReviewNB

@@ -16,7 +16,7 @@
"\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here you're not passing a custom family. You're telling Bambi to use the family named "t", which is a built-in family since version 0.6.0

Custom families are things created with Family() class


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks for clarifying that

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use arviz.plot_lm in next iteration. Everything else looks good

Copy link
Contributor

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OriolAbril OriolAbril merged commit ea5f7a1 into pymc-devs:main Aug 26, 2021
@chiral-carbon chiral-carbon deleted the robust branch August 27, 2021 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants