added draft of the prevalence of malaria example #691

Dekermanjian · 2024-08-03T22:02:47Z

Relates to proposal #684

I have put together the first draft. Tagging @bwengals if you can review the example that would be wonderful.

This is my first time adding an example to pymc-examples and I have a few questions:

I added two data files (1 csv (tabular data) format, 1 tif (raster) format) to the data folder. I am not sure if that was the correct thing to do?
The data processing section is a little bit long. I can save the processed data as a csv file and skip a lot of the processing. What are your thoughts on this?
There are 2 interactive plots that use folium. I don't know if that will be supported on the website?

📚 Documentation preview 📚: https://pymc-examples--691.org.readthedocs.build/en/691/

review-notebook-app · 2024-08-03T22:02:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

bwengals · 2024-08-06T23:03:39Z

Hi @Dekermanjian sorry for the delay. I think this looks great, short and to the point. I think it could benefit from a little more narrative, but I wouldn't object as it is. You could also provide a little guidance on how you chose the Matern32 or m=[40, 40], c=2.5. Or link to the other HSGP example nb that covers this.

I added two data files (1 csv (tabular data) format, 1 tif (raster) format) to the data folder. I am not sure if that was the correct thing to do?

Shouldn't be an issue, 82kB and 25kB isn't bad.

The data processing section is a little bit long. I can save the processed data as a csv file and skip a lot of the processing. What are your thoughts on this?

I think it's great as is, and it'd be helpful to include for people building a spatial model.

There are 2 interactive plots that use folium. I don't know if that will be supported on the website?

This I don't know... You should be able to build the documentation locally, and check if it works that way. I think if it works locally you're alright. @OriolAbril would know the answer to this

twiecki · 2024-08-07T09:24:07Z

examples/spatial/malaria_prevalence_PyMCv5.myst.md

+:::{post} Aug 04, 2024 
+:tags: spatial, autoregressive, count data
+:category: beginner, tutorial
+:author: Jonathan Dekermanjian, bwengals (please add your name here)


Suggested change

:author: Jonathan Dekermanjian, bwengals (please add your name here)

:author: Jonathan Dekermanjian, Bill Engels

Oh no need to add me as an author. It's great work and I wouldn't mind being associated with it, but I didn't write a single word!

We are collaborating on this example. Even though you did not write a single word your feedback and ideas are being incorporated into the work. I won't force your name to be added but I do think credit should be given to all contributors of a piece of work.

twiecki · 2024-08-07T09:24:55Z

examples/spatial/malaria_prevalence_PyMCv5.myst.md

+# The prevalence of malaria in the Gambia
+
+:::{post} Aug 04, 2024 
+:tags: spatial, autoregressive, count data


these are too detailed, please take a look at existing tags and re-use those, should add a GP one.

Hey @twiecki , I had copied those tags from the notebook "Conditional Autoregressive (CAR) Models for Spatial Data". Would it make more sense if the tags were just "spatial" and "gp"?

GP should be added. I have no strong opinion on the others.

Dekermanjian · 2024-08-07T10:45:55Z

@bwengals

You could also provide a little guidance on how you chose the Matern32 or m=[40, 40], c=2.5. Or link to the other HSGP example nb that covers this.

Yes, that is a great idea. In fact I used that example to choose my m and c values. As for the Matern32, the Matern family of covariance functions are popular covariance functions for spatially related data and selecting the v=3/2 as opposed to 5/2 is because I thought the 5/2 would overly smooth the transitions (I may be wrong since I have not tested that out). I can add a section about why the Matern family of covariance functions are typically used for spatially related data and talk about the smoothing as we change v. Do you think that would be helpful?

Another thing I would like to talk about is the length scale. I always struggle with this concept and would really like to offer the reader a simple explanation of how to think about it. Is it correct to say this: In the notebook you'll see that the posterior mean of "ls" is 0.187. Since we are passing in degrees of longitude and latitude to the Matern, is it fair to interpret the length scale this way: We can expect the gaussian mean to decay towards 0 (since we set a 0 mean function) as we move 0.187 degrees away from any sampled point on the map?

bwengals · 2024-08-08T05:40:34Z

Yes, that is a great idea. In fact I used that example to choose my m and c values. As for the Matern32, the Matern family of covariance functions are popular covariance functions for spatially related data and selecting the v=3/2 as opposed to 5/2 is because I thought the 5/2 would overly smooth the transitions (I may be wrong since I have not tested that out). I can add a section about why the Matern family of covariance functions are typically used for spatially related data and talk about the smoothing as we change v. Do you think that would be helpful

Sure, as much as you'd like! Why Matern covariances are used for spatial data is super interesting, and might steer people away from ExpQuad. Or even include the reasoning you give here (thought Matern52 would be to smooth, etc).

is it fair to interpret the length scale this way: We can expect the gaussian mean to decay towards 0 (since we set a 0 mean function) as we move 0.187 degrees away from any sampled point on the map?

Yes exactly this. But it's not a hard cut off, and usually the lengthscale posterior is poorly constrained by the data so it still has a large standard deviation. Like if you plot:

x = np.linspace(0, 10, 100)
K = pm.gp.cov.Matern52(input_dim=1, ls=1)(x[:, None]).eval()
plt.plot(x, K[:, 0]);

Then you'll see the covariance between x and x' it gives with a specific lengthscale ls=1.

Dekermanjian · 2024-08-08T21:13:29Z

@bwengals Thank you for that explanation it was very helpful. Okay, so what I will do is:

Link the examples on HSGP first steps
Add a section talking about why we use the Matern and why we wouldn't want to use the ExpQuad
Talk about how to interpret the posterior mean of the length scale parameter

I will do my best to have these done in the next couple of days.

bwengals · 2024-08-10T00:39:37Z

Sure no rush, thanks! I saw this recently, might be helpful for he lengthscale interpretation

OriolAbril

Plots seem to render correctly: https://pymcio--691.org.readthedocs.build/projects/examples/en/691/spatial/malaria_prevalence_PyMCv5.html#data-processing, so kudos to pydata-sphinx-theme team

Couple extra general comments. The filename makes it into the url for the page (as seen above) I would remove pymcv5 from the name.

You can use the authors section for more detailed authorship description and acknowledgements but I agree if you wrote everything you should be the only author (for now). At the bottom you can link to the original source/inspiration, say you were the one to adapt it but with extensive feedback by Bill or something like that

…xplanation to why we use matern vs expquad, added acknowledgements section under authors

Dekermanjian · 2024-08-11T22:23:27Z

@OriolAbril Thank you for checking the interactive plots! I removed pymcv5 from the filename and I added a sub-section for Acknowledgements within the Authors section.

@bwengals I added the items that I listed earlier:

mentioned and linked the HSGP first steps example to provide detailed examples on how to choose the m and c parameters of the HSGP
Put in a little blurb about how to think about the estimated length scale.
Added a section right before the conclusions highlighting the advantages of the Matern covariance function compared to the expquad covariance function and why we would want that in spatial analyses.

If you don't mind, whenever you have the chance, could you review the new additions?

Thank you both for your help with this example!

Dekermanjian · 2024-08-11T22:33:14Z

Sure no rush, thanks! I saw this recently, might be helpful for he lengthscale interpretation

Thank you for sharing this with me! I will give it a read and see if I can provide an enhanced interpretation on the length scale.

Dekermanjian · 2024-08-20T12:09:50Z

Hey @bwengals I read through the article you shared with me. Here are my high level take-aways:

For shorter length scales the effect on posterior mean and HDI coverage are negligible to the choice of prior placed on the length scale (they specifically tested Cauchy, half normal, gamma, and Weibull. The rational was varying the density mass near 0 and the tails).
For longer length scales the Cauchy and Weibull in their simulations resulted in large posterior mean bias compared to the true function and poorer HDI coverage. However, results between the half normal and gamma priors were negligible.

The conditions of their testing were fixing the signal to noise ratio across varying length scales, simulating data using a uniform distribution, and attempting to recover both the true length scale and true function using an ExpQuad 1-d GP.

review-notebook-app · 2024-08-23T17:11:42Z

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-08-23T17:11:42Z
----------------------------------------------------------------

Too add a little bit more narrative and make the exampler friendlier to non-experts, you could turn the comments to text in markdown cells.

review-notebook-app · 2024-08-23T17:11:43Z

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-08-23T17:11:43Z
----------------------------------------------------------------

Again for non-experts could you explain what EPSG is and why we care in particular of 4326

review-notebook-app · 2024-08-23T17:11:44Z

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-08-23T17:11:43Z
----------------------------------------------------------------

maybe pass kind="stats" argument

review-notebook-app · 2024-08-23T17:11:45Z

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-08-23T17:11:44Z
----------------------------------------------------------------

not entirelly clear to me. Can we overlap the map or the observations?

review-notebook-app · 2024-08-23T17:21:46Z

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-08-23T17:21:46Z
----------------------------------------------------------------

Just a couple small spelling things. Otherwise, thanks for adding this explanation!

Quadtratic -> Quadratic

paramter -> parameter

smooting -> smoothing

bwengals · 2024-08-23T17:28:57Z

I'm good with the nb as is, pending @aloctavodia suggestions

Dekermanjian · 2024-08-23T23:10:52Z

@aloctavodia thank you very much for your review and suggestions. I will make the requested additions.

@bwengals Thank you again for reviewing this example. I will clean up those spelling typos.

I appreciate all the feedback, thank you!

…rative, made the graph after inference clearer, fixed typos

Dekermanjian · 2024-08-24T23:37:53Z

@aloctavodia I made the changes that you suggested. I added a paragraph to describe coordinate reference systems, put more of a narrative in addition to the comments in the data processing section, and I plotted the actual prevalences side by side next to the inferred prevalences. When you have a chance please let me know if that looks good to you.

review-notebook-app · 2024-08-26T06:13:00Z

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2024-08-26T06:12:59Z
----------------------------------------------------------------

I wonder if the divergence are caused by the really large m and c values. Since the data size is small, what if you use pm.gp.Latent here instead, so that the HSGP approximation doesn't come into play and it's only about the difference in the Matern kernels?

Dekermanjian commented on 2024-08-27T01:28:10Z
----------------------------------------------------------------

Hey @bwengals, I swapped out the HSGP for an actual GP but I am still getting divergences. I tried increasing the sample size, swapping out the priors on the length scale and on eta, but nothing seems to get rid of the divergences.

aloctavodia · 2024-08-26T16:03:02Z

thanks @Dekermanjian! It reads much clearer now.

Dekermanjian · 2024-08-27T01:26:40Z

thanks @Dekermanjian! It reads much clearer now.

@aloctavodia Thank you for your feedback. I really appreciate it.

Dekermanjian · 2024-08-27T01:28:11Z

Hey @bwengals, I swapped out the HSGP for an actual GP but I am still getting divergences. I tried increasing the sample size, swapping out the priors on the length scale and on eta, but nothing seems to get rid of the divergences.

View entire conversation on ReviewNB

bwengals · 2024-08-28T01:55:30Z

Try this code instead:

# Fit a GP to model the simulated data
with pm.Model() as matern_model:

    eta = pm.Exponential("eta", scale=10.0)
    ls = pm.Lognormal("ls", mu=0.5, sigma=0.75)
    cov_func = eta**2 * pm.gp.cov.Matern32(input_dim=1, ls=ls)
    gp = pm.gp.Latent(cov_func=cov_func)
    s = gp.prior("s", X=x[:, None])

    measurement_error = pm.Exponential("measurement_error", scale=5.0)
    pm.Normal("likelihood", mu=s, sigma=measurement_error, observed=y)
    matern_idata = pm.sample(tune=2000, nuts_sampler="numpyro", target_accept=0.98)

The divergences aren't completely gone, but the different priors on the hyperparameters are helping to tamp them down. GP models are really sensitive to how those are set. To get rid of divergences completely, you can switch to pm.gp.Marginal instead of pm.gp.Latent, which is only possible because the likelihood is Gaussian. Notice you don't need to turn target_accept up here. That code is:

# Fit a GP to model the simulated data
with pm.Model() as matern_model:

    eta = pm.Exponential("eta", scale=10.0)
    ls = pm.Lognormal("ls", mu=0.5, sigma=0.75)
    cov_func = eta**2 * pm.gp.cov.Matern32(input_dim=1, ls=ls)
    
    gp = pm.gp.Marginal(cov_func=cov_func)
    
    measurement_error = pm.Exponential("measurement_error", scale=5.0)
    gp.marginal_likelihood("likelihood", X=x[:, None], y=y, sigma=measurement_error)
    
    matern_idata = pm.sample(tune=2000, nuts_sampler="numpyro")

Dekermanjian · 2024-08-29T00:23:19Z

@bwengals thank you! That certainly reduced the divergences quite a bit. I added in your changes and pushed up a new version.

bwengals · 2024-08-29T00:37:51Z

I'd approve, as long as @aloctavodia is good with things?

Dekermanjian · 2024-09-07T12:39:19Z

Hey @aloctavodia, is there anything in the example you'd like me to update or anything you think I should add to make it a better example? Thank you for taking the time to review the example. I really appreciate it!

aloctavodia · 2024-09-07T13:07:09Z

Thanks, @Dekermanjian, this is a great addition to the example gallery.

Dekermanjian · 2024-09-07T13:44:28Z

Thank you very much, @aloctavodia!!

added draft of the prevalence of malaria example

3dc6ffa

twiecki reviewed Aug 7, 2024

View reviewed changes

OriolAbril reviewed Aug 10, 2024

View reviewed changes

added link to HSGP example, added lengthscale interpretation, added e…

238b8f7

…xplanation to why we use matern vs expquad, added acknowledgements section under authors

added coordinate reference system explanation and data processing nar…

1d2e971

…rative, made the graph after inference clearer, fixed typos

replaced HSGP with GP in covariance functions talk

ed7b1a0

updated priors on GP section

688e32d

bwengals self-requested a review August 29, 2024 00:36

aloctavodia approved these changes Sep 7, 2024

View reviewed changes

aloctavodia merged commit 2d9fec2 into pymc-devs:main Sep 7, 2024
2 checks passed

	:author: Jonathan Dekermanjian, bwengals (please add your name here)
	:author: Jonathan Dekermanjian, Bill Engels

Uh oh!

added draft of the prevalence of malaria example #691

added draft of the prevalence of malaria example #691

Uh oh!

Conversation

Dekermanjian commented Aug 3, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 3, 2024

Uh oh!

bwengals commented Aug 6, 2024

Uh oh!

twiecki Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

bwengals Aug 8, 2024

Choose a reason for hiding this comment

Uh oh!

Dekermanjian Aug 8, 2024

Choose a reason for hiding this comment

Uh oh!

twiecki Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

Dekermanjian Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

aloctavodia Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

Dekermanjian commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bwengals commented Aug 8, 2024

Uh oh!

Dekermanjian commented Aug 8, 2024

Uh oh!

bwengals commented Aug 10, 2024

Uh oh!

OriolAbril left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dekermanjian commented Aug 11, 2024

Uh oh!

Dekermanjian commented Aug 11, 2024

Uh oh!

Dekermanjian commented Aug 20, 2024

Uh oh!

review-notebook-app bot commented Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bwengals commented Aug 23, 2024

Uh oh!

Dekermanjian commented Aug 23, 2024

Uh oh!

Dekermanjian commented Aug 24, 2024

Uh oh!

review-notebook-app bot commented Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aloctavodia commented Aug 26, 2024

Uh oh!

Dekermanjian commented Aug 27, 2024

Uh oh!

Dekermanjian commented Aug 27, 2024

Uh oh!

bwengals commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Dekermanjian commented Aug 3, 2024 •

edited by github-actions bot

Loading

Dekermanjian commented Aug 7, 2024 •

edited

Loading

OriolAbril left a comment •

edited

Loading

review-notebook-app bot commented Aug 23, 2024 •

edited

Loading

review-notebook-app bot commented Aug 23, 2024 •

edited

Loading

review-notebook-app bot commented Aug 23, 2024 •

edited

Loading

review-notebook-app bot commented Aug 23, 2024 •

edited

Loading

review-notebook-app bot commented Aug 23, 2024 •

edited

Loading

review-notebook-app bot commented Aug 26, 2024 •

edited

Loading

bwengals commented Aug 28, 2024 •

edited

Loading