Update GLM predictions #204

chiral-carbon · 2021-08-06T17:08:03Z

Addresses issue #85 and aims to advance it to best practices, use arviz-darkgrid and use bambi.

review-notebook-app · 2021-08-06T17:08:07Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

review-notebook-app · 2021-08-06T17:11:58Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-06T17:11:58Z
----------------------------------------------------------------

@tomicapretto had a doubt here as to how to convert this to use bambi instead. we have to pass a dataframe rather than x and y in bambi.Model() so should I concat x with y and pass that? would that be correct?

tomicapretto commented on 2021-08-06T17:26:26Z
----------------------------------------------------------------

Yep! Bambi only works with data frames so far. So you need to put all the data in a pandas data frame and then pass the data as the data argument.

OriolAbril commented on 2021-08-06T17:39:27Z
----------------------------------------------------------------

does it support out of sample posterior predictive sampling though?

tomicapretto commented on 2021-08-06T19:58:18Z
----------------------------------------------------------------

Yes, it does (in the dev version)

model.predict(idata, kind="pps", data=new_data)

Where

model is the Bambi model.

idata is the inference data returned by the sampling process.

new_data is a new pandas data frame that has the same structure than the data frame used to fit the model, except from the variable that indicates the number of successes.

If you have Model("p(y, n) ~ x1*x2", data, family="binomial")

Then the new data frame must have columns for x1, x2 and n

tomicapretto · 2021-08-06T17:26:27Z

Yep! Bambi only works with data frames so far. So you need to put all the data in a pandas data frame and then pass the data as the data argument.

View entire conversation on ReviewNB

OriolAbril · 2021-08-06T17:39:28Z

does it support out of sample posterior predictive sampling though?

View entire conversation on ReviewNB

tomicapretto · 2021-08-06T19:58:19Z

Yes, it does (in the dev version)

model.predict(idata, kind="pps", data=new_data)

Where

model is the Bambi model.

idata is the inference data returned by the sampling process.

new_data is a new pandas data frame that has the same structure than the data frame used to fit the model, except from the variable that indicates the number of successes.

If you have Model("p(y, n) ~ x1*x2", data, family="binomial")

Then the new data frame must have columns for x1, x2 and n

View entire conversation on ReviewNB

review-notebook-app · 2021-08-24T16:56:15Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-24T16:56:14Z
----------------------------------------------------------------

I would not compute the decision boundary with the mean of the posterior but instead use xarray, again, everything should broadcast automatically. If doing this we can delete this remark

Also, the Deterministic is only valid in pure pymc3 whereas postprocessing with xarray only needs to have the inferencedata

review-notebook-app · 2021-09-01T12:44:44Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-09-01T12:44:44Z
----------------------------------------------------------------

I get a memory error here, but the shape of the dataset being passed is 90000x4, not 90000x25000 as printed.

so what is the mistake here?

tomicapretto commented on 2021-09-01T13:48:28Z
----------------------------------------------------------------

This is definetly a problem with Bambi trying to create a very large object within the predict method. I will try to replicate it on my side to try to understand what is going on.

What I'm not sure about is why you are using design_matrices and then you do the as_dataframe(). I think this can be simplified to:

grid = np.linspace(start=-9, stop=9, num=300)
x1, x2 = np.meshgrid(grid, grid)
x_grid = np.stack(arrays=[x1.flatten(), x2.flatten()], axis=1)
new_data = pd.DataFrame(x_grid, columns=["x1", "x2"])
model.predict(trace, kind="pps", data=new_data, draws=1000)

chiral-carbon commented on 2021-09-01T14:28:22Z
----------------------------------------------------------------

I thinkI simply replaced the previous patsy usage with formulae.design_matrices(), but thanks this looks much simpler. I do get the same error though on running with the code you provided.

Sayam753 commented on 2021-09-19T08:46:21Z
----------------------------------------------------------------

so what is the mistake here?

I don't think there is any mistake. The shape of dataset is (90k, 4).

The shape (90k, 25k) can be interpreted as -

There are 90k rows in the dataset (generated by mesh)
We also have 25k samples (from 5 chains and 5000 samples each chain) for each random variable in PyMC3 model.
For each row (tuple of (Intercept, x1, x2, x1*x2)), we have 25k ways of generating y .
So, for all the rows in the dataset, the resulting shape of y will be (90k, 25k). And this seems a pretty big number.

To solve this, two obvious ways would be to

either reduce number of draws/chains in pm.sample and make sure sampler converged
or reduce the number of data points to generate smaller artificial dataset

chiral-carbon commented on 2021-09-20T17:24:38Z
----------------------------------------------------------------

I'll try this and see if it works

review-notebook-app · 2021-09-01T12:49:50Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-09-01T12:49:50Z
----------------------------------------------------------------

I think the model overfitting? have I made any mistakes in the model definition step?

tomicapretto · 2021-09-01T13:48:29Z

This is definetly a problem with Bambi trying to create a very large object within the predict method. I will try to replicate it on my side to try to understand what is going on.

What I'm not sure about is why you are using design_matrices and then you do the as_dataframe(). I think this can be simplified to:

grid = np.linspace(start=-9, stop=9, num=300)
x1, x2 = np.meshgrid(grid, grid)
x_grid = np.stack(arrays=[x1.flatten(), x2.flatten()], axis=1)
new_data = pd.DataFrame(x_grid, columns=["x1", "x2"])
model.predict(trace, kind="pps", data=new_data, draws=1000)

View entire conversation on ReviewNB

chiral-carbon · 2021-09-01T14:28:23Z

I thinkI simply replaced the previous patsy usage with formulae.design_matrices(), but thanks this looks much simpler. I do get the same error though on running with the code you provided.

View entire conversation on ReviewNB

review-notebook-app · 2021-09-08T15:46:55Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-09-08T15:46:55Z
----------------------------------------------------------------

I don't understand this sentence, and there is still a link to the old glm module code in pymc3 repo

review-notebook-app · 2021-09-19T03:59:53Z

View / edit / reply to this conversation on ReviewNB

Sayam753 commented on 2021-09-19T03:59:53Z
----------------------------------------------------------------

The first sentence should use bambi.

Sayam753 · 2021-09-19T08:46:22Z

so what is the mistake here?

I don't think there is any mistake. The shape of dataset is (90k, 4).

The shape (90k, 25k) can be interpreted as -

There are 90k rows in the dataset (generated by mesh)
We also have 25k samples (from 5 chains and 5000 samples each chain) for each random variable in PyMC3 model.
For each row (tuple of (Intercept, x1, x2, x1*x2)), we have 25k ways of generating y .
So, for all the rows in the dataset, the resulting shape of y will be (90k, 25k). And this seems a pretty big number.

To solve this, two obvious ways would be to

either reduce number of draws/chains in pm.sample and make sure sampler converged
or reduce the number of data points to generate smaller artificial dataset

View entire conversation on ReviewNB

chiral-carbon · 2021-09-20T17:24:39Z

I'll try this and see if it works

View entire conversation on ReviewNB

updates

review-notebook-app · 2021-10-10T07:47:07Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-10-10T07:47:06Z
----------------------------------------------------------------

@OriolAbril the decision boundary does not look accurate to me and I think it could be because of the reduced sampling size and the reduced dataset size generated by np.mesh() ? how should I correct it?

fonnesbeck · 2022-01-30T20:38:32Z

Is this close to being ready? At this point we should go ahead and convert to v4.

chiral-carbon · 2022-01-30T20:52:46Z

@fonnesbeck Hi, sorry to leave it here without updating. Will work on it this week and finish it up.

fonnesbeck · 2022-05-11T13:56:40Z

@chiral-carbon any chance of pushing this out, or do you want to hand it over to someone?

chiral-carbon · 2022-05-11T14:03:10Z

I'll handle it! If I don't push it out this weekend I'll let you know to hand it over to someone else

…

On Wed, 11 May 2022, 19:26 Chris Fonnesbeck, ***@***.***> wrote: @chiral-carbon <https://github.com/chiral-carbon> any chance of pushing this out, or do you want to hand it over to someone? — Reply to this email directly, view it on GitHub <#204 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGFHGBHXAYTWKM53JGC4IEDVJO4CFANCNFSM5BWI4F7Q> . You are receiving this because you were mentioned.Message ID: ***@***.***>

chiral-carbon · 2022-05-15T20:10:16Z

@fonnesbeck should I try to update this to v4 now in this PR or update to v3 only and leave v4 update for a new PR? this same question applies to the other open PRs I have pending.

ghost · 2022-05-17T02:15:25Z

Definitely v4, as we are trying to get as many examples ported before the v4 release as possible. Let me know if you need a hand!

chiral-carbon · 2022-05-17T10:39:35Z

@cfonnesbeck okay! will need some help in that case. you could just point out what I should refer to/start with, or anything else that should be done first.

fonnesbeck · 2022-05-17T20:23:48Z

It looks like this one is in a bit of a holding pattern until the update to Bambi is released, but if you want to get a head start you'd have to build Bambi from its v4 branch, switch from pymc3 to pymc and see what breaks.

tomicapretto · 2022-05-18T15:01:47Z

It looks like this one is in a bit of a holding pattern until the update to Bambi is released, but if you want to get a head start you'd have to build Bambi from its v4 branch, switch from pymc3 to pymc and see what breaks.

@chiral-carbon @fonnesbeck most things should work fine in the new v4 branch. The problems are related to some specific cases. I can assist if you need help!

chiral-carbon · 2022-05-19T07:31:51Z

@tomicapretto @fonnesbeck thanks! I will update here when as and when I need help

drbenvincent · 2022-06-10T14:04:32Z

This notebook was updated to v4 by #370, so closing this PR

chiral-carbon added the outreachy2021 label Aug 6, 2021

chiral-carbon mentioned this pull request Aug 13, 2021

Update GLM robust #205

Merged

MarcoGorelli marked this pull request as draft August 13, 2021 13:25

chiral-carbon marked this pull request as ready for review September 1, 2021 12:42

chiral-carbon changed the title ~~[WIP] Update GLM predictions~~ Update GLM predictions Sep 1, 2021

chiral-carbon added 2 commits October 10, 2021 13:03

[WIP] Update GLM predictions

321eea2

updates

changes

da4ed14

chiral-carbon force-pushed the oos-predictions branch from 86510ef to da4ed14 Compare October 10, 2021 07:35

add reference

1a86e19

fix referencing error

131098a

drbenvincent closed this Jun 10, 2022

Uh oh!

Update GLM predictions #204

Update GLM predictions #204

Uh oh!

Conversation

chiral-carbon commented Aug 6, 2021

Uh oh!

review-notebook-app bot commented Aug 6, 2021

Uh oh!

review-notebook-app bot commented Aug 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomicapretto commented Aug 6, 2021

Uh oh!

OriolAbril commented Aug 6, 2021

Uh oh!

tomicapretto commented Aug 6, 2021

Uh oh!

review-notebook-app bot commented Aug 24, 2021 • edited by OriolAbril Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Sep 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Sep 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomicapretto commented Sep 1, 2021

Uh oh!

chiral-carbon commented Sep 1, 2021

Uh oh!

review-notebook-app bot commented Sep 8, 2021

Uh oh!

review-notebook-app bot commented Sep 19, 2021

Uh oh!

Sayam753 commented Sep 19, 2021

Uh oh!

chiral-carbon commented Sep 20, 2021

Uh oh!

review-notebook-app bot commented Oct 10, 2021

Uh oh!

fonnesbeck commented Jan 30, 2022

Uh oh!

chiral-carbon commented Jan 30, 2022

Uh oh!

fonnesbeck commented May 11, 2022

Uh oh!

chiral-carbon commented May 11, 2022 via email

Uh oh!

chiral-carbon commented May 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented May 17, 2022

Uh oh!

chiral-carbon commented May 17, 2022

Uh oh!

fonnesbeck commented May 17, 2022

Uh oh!

tomicapretto commented May 18, 2022

Uh oh!

chiral-carbon commented May 19, 2022

Uh oh!

drbenvincent commented Jun 10, 2022

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 6, 2021 •

edited

Loading

review-notebook-app bot commented Aug 24, 2021 •

edited by OriolAbril

Loading

review-notebook-app bot commented Sep 1, 2021 •

edited

Loading

review-notebook-app bot commented Sep 1, 2021 •

edited

Loading

chiral-carbon commented May 15, 2022 •

edited

Loading