Mediation: new example notebook #158

drbenvincent · 2021-05-08T16:09:44Z

Here is a new example notebook on mediation. Note that this is not a modification of my previous moderation notebook, this is a brand new one on a new topic. I believe I'm following best practice at this point, but very happy to get some corrections or suggestions.

review-notebook-app · 2021-05-08T16:09:48Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

OriolAbril · 2021-05-17T11:50:13Z

I don't know what is happening but reviewnb is not working, I can write reviews but posting errors out. I tried yesterday and today it's still happening, I can use it in other PRs though, maybe some weird caching. I basically have two comments only and I think reviewnb is not too necessary here:

I am not completely sure that the model will do what we want it to do. I think this will only use m to add the corresponding log likelihood term and won't sample from it. Thus, m is effectively a constant equal to its observed values when sampling and this is what is used in y. However, from the description above, I think m should be sampled when used in y so that the definitions of the different effects hold for every sample. Can (or have you already) you check that the total effect is the same for this and for the equivalent model that estimates only the total effect without moderation?

can you also add xarray to the watermark? It should be ... -p theano,xarray

drbenvincent · 2021-05-17T16:24:39Z

Thanks for checking this. I think I know what you mean but I'm not 100% sure. So I went back to the Yuan and MacKinnon paper. Their model (in WinBUGS/JAGS) is

model {
    # y[i], m[i] and x[i] denote data vectors 
    for(i in 1:N)
    {
        m[i] ~ dnorm(mean.m[i], prec.m) 
        mean.m[i] <- beta2 + alpha*x[i]

        y[i] ~ dnorm(mean.y[i], prec.y)
        mean.y[i] <- beta3 + beta*m[i] + tau.prime*x[i]
    }
    # prior distribution of parameters. Huge variances, essentially noninformative.
    beta2 ~ dnorm(0, 1.0E-6)
    beta3 ~ dnorm(0, 1.0E-6)
    alpha ~ dnorm(0, 1.0E-6)
    beta ~ dnorm(0, 1.0E-6)
    tau.prime ~ dnorm(0, 1.0E-6)
    # dgamma (a, b) is a gamma distribution with the shape parameter a and inverse scale parameter b.
    prec.y ~ dgamma(0.001, 0.001)
    prec.m ~ dgamma(0.001, 0.001)
    # define the mediated effect as function of parameters
    theta <- alpha*beta
}

As far as I can tell I've basically implemented that model. But to be sure I changed the latter parts to

μm = im + a * x
m = pm.Normal("m", mu=μm, sd=σm, observed=m) 

μy = iy + b * m + cprime * x
y = pm.Normal("y", mu=μy, sd=σy, observed=y)

which gives the same results. Implementing the total effect only model, gives the same estimate, ie the posterior for c is the same as for c'. I also double checked the posteriors against the true effects and they seem like good estimates (direct effect = 0.3, indirect effect =0.3, total effect = 0.6).

But I think what you are suggesting is something different - and if it's an issue then it would presumably also be an issue with the Yuan and MacKinnon paper, which could be interesting.

OriolAbril · 2021-05-18T21:04:47Z

Indeed, I think the main challenge is actually knowing what needs to be implemented. I don't know about JAGS so I am not 100% sure what is happening in that source, but I see 3 (and a half) possibilities:

Take the m term in defining the mean of y as the observations for m. This doesn't feel right to me and I believe this is what is happening in both of the samples codes you provided because you are using m directly (see for example https://discourse.pymc.io/t/plotting-traces-from-time-series/4998). You can double check if this is what is happening by storing m as a deterministic. From the comment in the JAGS code however, it may be what is actually desired. If so however, I would use the actual data instead of using m to make sure there is no confusion on that end.
Sample from the distribution from m at each draw and use those values to calculate the mean of y. I think this is what I would do intuitively, which probably requires duplicating m with .dist in order to sample from it, not completely sure how well that would work. Another option could be defining m as unobserved so it is sampled as part of the posterior and add the likelihood as a potential right afterwards, but I think this defines a slightly different model.
(Instead of using observed or sampled m, use it's mean as "middle ground". 🤷)
Use an alternate parametrization to avoid using m at all like in the "total effect" bullet point. It may still be possible to retrieve the parameters we care about afterwards 🤔

fonnesbeck · 2021-05-18T22:30:02Z

You definitely don't want to assign the normal likelihoods to m and y. In fact, they don't need to be assigned to anything, because they are likelihoods, and the return value is not used elsewhere. JAGS parses the code and treats the left hand side as data; PyMC does not do this.

drbenvincent · 2021-05-19T10:19:57Z

Thanks both. The original model I was working from was defined as:

where it is clear than the M in the second equation is data. So from my the model description in the notebook:

the m_i should be seen as data, not a random draw. Maybe this clears up confusion? As in, the m in y is not meant to be a random sample. I've added a bit of text to that effect.

Following @fonnesbeck's suggestion, I changed the model to

pm.Normal("m_likehood", mu=im + a * x, sigma=σm, observed=m)
pm.Normal("y_likehood", mu=iy + b * m + cprime * x, sigma=σy, observed=y)

which avoids confusion and removes m as a parent of y in the graphviz figure. I'm not sure if it was actually overwriting the 'data m' though because the results are identical. Either way, this makes much more sense.

So maybe this resolves the concerns?

On a separate note, I also added y and m as pm.Data objects, which seems like the right thing to do. This worked fine and again gives exactly the same results, but it results in a bit of a crazy graphviz diagram. For example, I'm not sure why the data nodes have parents.

fonnesbeck · 2021-05-19T14:06:16Z

Yeah, I'm not exactly sure what happens if you assign the likelihood to m, thus overwriting the data with the Normal likelihood. It probably works fine in most cases, since the value of the variable (which is fixed to m) should be used, but you never know.

OriolAbril · 2021-05-22T10:38:59Z

On a separate note, I also added y and m as pm.Data objects, which seems like the right thing to do. This worked fine and again gives exactly the same results, but it results in a bit of a crazy graphviz diagram. For example, I'm not sure why the data nodes have parents.

Data nodes can be predictors/fixed data on which we condition our inference or observations. The first won't have parents (x) whereas the second won't have children and their parent will be the corresponding likelihood term (y). If a variable does both roles it will have both parents and children like m. It is fine to add m and y as data, but in 3.x this basically has no effect on what you can do with the model.

pm.Data generates theano shared variables, so that one can change the data and rerun the model without having to recompile it (which is generally fast for pymc but it can still take significant time in many cases). However, in v3 shapes are fixed at compilation, therefore, you can only change the observed data and successfully refit the model if the new observations have the same shape as the original ones (with x and posterior predictive the situation is different, but posterior sampling has this limitation). Using Data for y and m is adding this arguable new capability to the model. In v4 however, shapes will still be symbolic when compiling, so using data will allow to refit the model on any new/different data, for example for cross validation.

review-notebook-app · 2021-05-22T10:48:51Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-05-22T10:48:50Z
----------------------------------------------------------------

We should avoid using "sanity check". Here are some alternatives: https://gist.github.com/seanmhanson/fe370c2d8bd2b3228680e38899baf5cc

drbenvincent commented on 2021-05-22T11:41:33Z
----------------------------------------------------------------

Will change to "double check"

review-notebook-app · 2021-05-22T10:48:51Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-05-22T10:48:51Z
----------------------------------------------------------------

I would overlay the distributions instead. I think something like this will work, maybe will need some extra work on colors

ax = az.plot_posterior(..., point_estimate=None, hdi_prob="hide", label="Total...")
az.plot_posterior(...,  ax=ax)
ax.legend();

drbenvincent commented on 2021-05-22T12:08:59Z
----------------------------------------------------------------

I've given this a go. It was throwing an error providing the label kwarg, but have explained colours of curves.

OriolAbril commented on 2021-05-22T12:38:54Z
----------------------------------------------------------------

This is either an issue with ArviZ or a documentation error. According to the docstring **kwargs should be passed to plt.plot. I have opened https://github.com/arviz-devs/arviz/issues/1703

drbenvincent · 2021-05-22T11:41:34Z

Will change to "double check"

View entire conversation on ReviewNB

drbenvincent · 2021-05-22T12:09:00Z

I've given this a go. It was throwing an error providing the label kwarg, but have explained colours of curves.

View entire conversation on ReviewNB

OriolAbril · 2021-05-22T12:38:55Z

This is either an issue with ArviZ or a documentation error. According to the docstring **kwargs should be passed to plt.plot. I have opened https://github.com/arviz-devs/arviz/issues/1703

View entire conversation on ReviewNB

drbenvincent added 2 commits May 8, 2021 16:52

mediation analysis notebook + figure

b6818e0

pre-commit checks

a293794

drbenvincent added 2 commits May 9, 2021 11:51

fix typos

48f4b23

xr import not needed, update sampling parameters

6debdec

add xarray to watermark

f7e308d

tweak model definition, add sanity check for total effect

72affd9

update likelihood terms + add m, x as pm.Data objects

aacec75

update comparison figure + "sanity check" -> "double check"

2705d27

OriolAbril approved these changes May 22, 2021

View reviewed changes

Merge branch 'main' into mediation

d76b09e

OriolAbril merged commit 2011be1 into pymc-devs:main Jun 6, 2021

drbenvincent deleted the mediation branch June 7, 2021 09:44

drbenvincent mentioned this pull request Aug 17, 2021

fix silly typo in mediation example notebook #211

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Mediation: new example notebook #158

Mediation: new example notebook #158

Uh oh!

drbenvincent commented May 8, 2021

Uh oh!

review-notebook-app bot commented May 8, 2021

Uh oh!

OriolAbril commented May 17, 2021

Uh oh!

drbenvincent commented May 17, 2021

Uh oh!

OriolAbril commented May 18, 2021 •

edited

Loading

Uh oh!

fonnesbeck commented May 18, 2021

Uh oh!

drbenvincent commented May 19, 2021

Uh oh!

fonnesbeck commented May 19, 2021

Uh oh!

OriolAbril commented May 22, 2021

Uh oh!

review-notebook-app bot commented May 22, 2021 •

edited

Loading

Uh oh!

review-notebook-app bot commented May 22, 2021 •

edited

Loading

Uh oh!

drbenvincent commented May 22, 2021

Uh oh!

drbenvincent commented May 22, 2021

Uh oh!

OriolAbril commented May 22, 2021

Uh oh!

Uh oh!

Uh oh!

Mediation: new example notebook #158

Mediation: new example notebook #158

Uh oh!

Conversation

drbenvincent commented May 8, 2021

Uh oh!

review-notebook-app bot commented May 8, 2021

Uh oh!

OriolAbril commented May 17, 2021

Uh oh!

drbenvincent commented May 17, 2021

Uh oh!

OriolAbril commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fonnesbeck commented May 18, 2021

Uh oh!

drbenvincent commented May 19, 2021

Uh oh!

fonnesbeck commented May 19, 2021

Uh oh!

OriolAbril commented May 22, 2021

Uh oh!

review-notebook-app bot commented May 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented May 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drbenvincent commented May 22, 2021

Uh oh!

drbenvincent commented May 22, 2021

Uh oh!

OriolAbril commented May 22, 2021

Uh oh!

Uh oh!

OriolAbril commented May 18, 2021 •

edited

Loading

review-notebook-app bot commented May 22, 2021 •

edited

Loading

review-notebook-app bot commented May 22, 2021 •

edited

Loading