-
-
Notifications
You must be signed in to change notification settings - Fork 272
Missing data and Bayesian Imputation #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t FIML Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
I think this is ready for review now. It's quite long and covers a number of approaches to imputation. (I) We discuss the taxonomies of missing-ness (MCAR), (MAR) and (MNAR). I try to set it up as a prelude to considerations about causal inference. (ii) FIML and MLE approaches to estimating a multivariate model given missing data Each of the approaches so far is presented in the Enders book and our estimates match those presented there. (v) I apply the missing data imputation to hierarchical model and estimate the values of the missing data informed by the structure of "team" clusters in our employee data set. The model is estimated using the blackjax sampler and shows divergences, but converges nicely with good Rhat numbers...,. I use the differences in imputation patterns between the hierarchical model and the simpler regression models to argue for why we need to be aware of heterogenous patterns of imputation and how this is analogous to concerns in causal inference of heterogenous treatment effects. We finish on a wrap up and celebration of the flexibility of bayesian modelling in an enterprise that has work with confounding and complexity. |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2023-01-24T02:41:15Z The table looks janky. Does it need to be placed in a code block to enforce monospace? NathanielF commented on 2023-01-24T10:55:40Z Fair. It was a bit needless. I've taken another approach just adding the patterns of missing-ness as a pandas dataframe: |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2023-01-24T02:41:16Z Should add a legend if possible. NathanielF commented on 2023-01-24T11:02:17Z Done. |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2023-01-24T02:41:17Z Perhaps add a sentence or two interpreting these plots? NathanielF commented on 2023-01-24T10:56:09Z Updated and added some more explanatory text |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2023-01-24T02:41:18Z Line #15. pm.Potential("x_logp", pm.logp(rv=pm.MvNormal.dist(mus, chol=cov_flat_prior), value=x)) Why are potentials being constructed here rather than just imputing with the MvNormal likelihood? Does that not work anymore? (perhaps I'm missing something obvious) NathanielF commented on 2023-01-24T10:57:31Z Yes, i think it's broken or not implemented in the latest version. I was getting the same error discussed here: https://discourse.pymc.io/t/automatic-imputation-of-multivariate-models/11029 |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2023-01-24T02:41:19Z Lower case y in "PyMC" NathanielF commented on 2023-01-24T10:57:43Z Adjusted! |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2023-01-24T02:41:19Z I'm not sure printing out the entire idata object is helpful, given how large and verbose it is. Maybe pull a few elements that are interesting? NathanielF commented on 2023-01-24T10:59:00Z Removed the idata_uniform entirely as it was a bit overkill. I left the idata_normal. I like having the ability to inspect the model output. Makes reproductions easier to check for consistency. |
Great tutorial! |
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Fair. It was a bit needless. I've taken another approach just adding the patterns of missing-ness as a pandas dataframe: View entire conversation on ReviewNB |
Updated and added some more explanatory text View entire conversation on ReviewNB |
Yes, i think it's broken or not implemented in the latest version. I was getting the same error discussed here: https://discourse.pymc.io/t/automatic-imputation-of-multivariate-models/11029 View entire conversation on ReviewNB |
Adjusted! View entire conversation on ReviewNB |
Removed the idata_uniform entirely as it was a bit overkill. I left the idata_normal. I like having the ability to inspect the model output. Makes reproductions easier to check for consistency. View entire conversation on ReviewNB |
Thank you for taking the time to review!! Glad you liked it. |
Done. View entire conversation on ReviewNB |
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments via nbreview.
Overall really cool. Bit of a vague comment, but I'd be tempted to add in a little more explanation. But that's just my own style, so feel free to ignore. For this more advanced level, it's quite possibly the case that people don't need more hand holding. Nevertheless, if you wanted to add some, it could make it more accessible to a broader range of readers.
House move is ongoing... the actual move won't happen for another month or so :)
Perfect, thanks @drbenvincent. Will adjust this evening. |
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Done View entire conversation on ReviewNB |
…d regression notebook Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Yes View entire conversation on ReviewNB |
I thought he just meant the legend in the picture i.e. the color labels for Empowerment etc... which were missing at the time he commented but are there now for me. View entire conversation on ReviewNB |
Changed this View entire conversation on ReviewNB |
Linked to that notebook too. View entire conversation on ReviewNB |
Done View entire conversation on ReviewNB |
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
…ot by team Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
…text Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
That should be good to go now @drbenvincent. I've tidied a few things and added some more explanatory text to sign post what i'm doing a bit more. I think i've also addressed all comments above. |
Sorry if I'm missing it, but can't see a reference to the example View entire conversation on ReviewNB |
Thanks! A quick find shows up some remaining examples which are not actual L2 markdown headings. ## Percentage Missing in this cell. A bunch in cell 31, one in cell 11. View entire conversation on ReviewNB |
Ah yes, Chris meant legend, but I meant figure caption :) View entire conversation on ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I added in a few minor replies to comments, but happy to approve after that.
Just above introducing the employee data set below the MNAR definition View entire conversation on ReviewNB |
👍🏻
View entire conversation on ReviewNB |
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>
Agh... sorry. I think i've got them all now. View entire conversation on ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff 👍🏻
A notebook on Missing Data methods and Bayesian imputation
Related to #461
This notebook aims to showcase methods for imputation of missing data using primarily bayesian methods. We will focus on a dataset which records employee satisfaction metrics drawn from the book Applied Missing Data Analysis. We will demonstrate how FIML and Bayesian imputation methods work using the Multivariate normal distribution differ and we also want to show how approximate the multivariate distribution using the sequential chained equation methods.