Skip to content

Incorrect computation in Posterior Predictive Checking section of BYM Notebook #675

Closed
@jessegrabowski

Description

@jessegrabowski

Notebook title: The Besag-York-Mollie Model for Spatial Data
Notebook url:https://www.pymc.io/projects/examples/en/latest/spatial/nyc_bym.html

Issue description

I believe this computation is invalid:

phi_pred = idata.posterior.phi.mean(("chain", "draw")).values
beta0_pred = idata.posterior.beta0.mean(("chain", "draw")).values
sigma_pred = idata.posterior.sigma.mean(("chain", "draw")).values
y_predict = np.exp(log_E + beta0_pred + sigma_pred * (1 / scaling_factor) * phi_pred)

Expected output

This code computes f(E[x]), when the mean prediction conditioned on rho = 1. The correct quantity to compute, however, is E[f(x)]. That is, the expectation should be taken last, only after computing the exp of the samples. The result is biased downward (since Jensen's inequality says E[f(x)] < f(E[x]) )

Proposed solution

This is actually a really great place to use pm.do. I propose:

with pm.do(BYM_model, {'rho':1.0}):
    y_predict_rho_1 = pm.sample_posterior_predictive(idata, var_names=['rho', 'mixture', 'mu'], predictions=True, extend_inferencedata=False)
y_predict = y_predict_rho_1.predictions.mu.mean(dim=['chain', 'draw'])

This would require wrapping mixture and mu in pm.Deterministic in the model code as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions