Skip to content

Mediation: new example notebook #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 6, 2021
Merged

Conversation

drbenvincent
Copy link
Contributor

Here is a new example notebook on mediation. Note that this is not a modification of my previous moderation notebook, this is a brand new one on a new topic. I believe I'm following best practice at this point, but very happy to get some corrections or suggestions.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@OriolAbril
Copy link
Member

I don't know what is happening but reviewnb is not working, I can write reviews but posting errors out. I tried yesterday and today it's still happening, I can use it in other PRs though, maybe some weird caching. I basically have two comments only and I think reviewnb is not too necessary here:

I am not completely sure that the model will do what we want it to do. I think this will only use m to add the corresponding log likelihood term and won't sample from it. Thus, m is effectively a constant equal to its observed values when sampling and this is what is used in y. However, from the description above, I think m should be sampled when used in y so that the definitions of the different effects hold for every sample. Can (or have you already) you check that the total effect is the same for this and for the equivalent model that estimates only the total effect without moderation?

can you also add xarray to the watermark? It should be ... -p theano,xarray

@drbenvincent
Copy link
Contributor Author

Thanks for checking this. I think I know what you mean but I'm not 100% sure. So I went back to the Yuan and MacKinnon paper. Their model (in WinBUGS/JAGS) is

model {
    # y[i], m[i] and x[i] denote data vectors 
    for(i in 1:N)
    {
        m[i] ~ dnorm(mean.m[i], prec.m) 
        mean.m[i] <- beta2 + alpha*x[i]

        y[i] ~ dnorm(mean.y[i], prec.y)
        mean.y[i] <- beta3 + beta*m[i] + tau.prime*x[i]
    }
    # prior distribution of parameters. Huge variances, essentially noninformative.
    beta2 ~ dnorm(0, 1.0E-6)
    beta3 ~ dnorm(0, 1.0E-6)
    alpha ~ dnorm(0, 1.0E-6)
    beta ~ dnorm(0, 1.0E-6)
    tau.prime ~ dnorm(0, 1.0E-6)
    # dgamma (a, b) is a gamma distribution with the shape parameter a and inverse scale parameter b.
    prec.y ~ dgamma(0.001, 0.001)
    prec.m ~ dgamma(0.001, 0.001)
    # define the mediated effect as function of parameters
    theta <- alpha*beta
}

As far as I can tell I've basically implemented that model. But to be sure I changed the latter parts to

μm = im + a * x
m = pm.Normal("m", mu=μm, sd=σm, observed=m) 

μy = iy + b * m + cprime * x
y = pm.Normal("y", mu=μy, sd=σy, observed=y)

which gives the same results. Implementing the total effect only model, gives the same estimate, ie the posterior for c is the same as for c'. I also double checked the posteriors against the true effects and they seem like good estimates (direct effect = 0.3, indirect effect =0.3, total effect = 0.6).

But I think what you are suggesting is something different - and if it's an issue then it would presumably also be an issue with the Yuan and MacKinnon paper, which could be interesting.

@OriolAbril
Copy link
Member

OriolAbril commented May 18, 2021

Indeed, I think the main challenge is actually knowing what needs to be implemented. I don't know about JAGS so I am not 100% sure what is happening in that source, but I see 3 (and a half) possibilities:

  1. Take the m term in defining the mean of y as the observations for m. This doesn't feel right to me and I believe this is what is happening in both of the samples codes you provided because you are using m directly (see for example https://discourse.pymc.io/t/plotting-traces-from-time-series/4998). You can double check if this is what is happening by storing m as a deterministic. From the comment in the JAGS code however, it may be what is actually desired. If so however, I would use the actual data instead of using m to make sure there is no confusion on that end.
  2. Sample from the distribution from m at each draw and use those values to calculate the mean of y. I think this is what I would do intuitively, which probably requires duplicating m with .dist in order to sample from it, not completely sure how well that would work. Another option could be defining m as unobserved so it is sampled as part of the posterior and add the likelihood as a potential right afterwards, but I think this defines a slightly different model.
  3. (Instead of using observed or sampled m, use it's mean as "middle ground". 🤷)
  4. Use an alternate parametrization to avoid using m at all like in the "total effect" bullet point. It may still be possible to retrieve the parameters we care about afterwards 🤔

@fonnesbeck
Copy link
Member

You definitely don't want to assign the normal likelihoods to m and y. In fact, they don't need to be assigned to anything, because they are likelihoods, and the return value is not used elsewhere. JAGS parses the code and treats the left hand side as data; PyMC does not do this.

@drbenvincent
Copy link
Contributor Author

Thanks both. The original model I was working from was defined as:
eq1
where it is clear than the M in the second equation is data. So from my the model description in the notebook:
Screenshot 2021-05-19 at 10 28 15
the m_i should be seen as data, not a random draw. Maybe this clears up confusion? As in, the m in y is not meant to be a random sample. I've added a bit of text to that effect.

Following @fonnesbeck's suggestion, I changed the model to

pm.Normal("m_likehood", mu=im + a * x, sigma=σm, observed=m)
pm.Normal("y_likehood", mu=iy + b * m + cprime * x, sigma=σy, observed=y)

which avoids confusion and removes m as a parent of y in the graphviz figure. I'm not sure if it was actually overwriting the 'data m' though because the results are identical. Either way, this makes much more sense.

So maybe this resolves the concerns?

On a separate note, I also added y and m as pm.Data objects, which seems like the right thing to do. This worked fine and again gives exactly the same results, but it results in a bit of a crazy graphviz diagram. For example, I'm not sure why the data nodes have parents.
Screenshot 2021-05-19 at 11 05 17

@fonnesbeck
Copy link
Member

Yeah, I'm not exactly sure what happens if you assign the likelihood to m, thus overwriting the data with the Normal likelihood. It probably works fine in most cases, since the value of the variable (which is fixed to m) should be used, but you never know.

@OriolAbril
Copy link
Member

On a separate note, I also added y and m as pm.Data objects, which seems like the right thing to do. This worked fine and again gives exactly the same results, but it results in a bit of a crazy graphviz diagram. For example, I'm not sure why the data nodes have parents.

Data nodes can be predictors/fixed data on which we condition our inference or observations. The first won't have parents (x) whereas the second won't have children and their parent will be the corresponding likelihood term (y). If a variable does both roles it will have both parents and children like m. It is fine to add m and y as data, but in 3.x this basically has no effect on what you can do with the model.

pm.Data generates theano shared variables, so that one can change the data and rerun the model without having to recompile it (which is generally fast for pymc but it can still take significant time in many cases). However, in v3 shapes are fixed at compilation, therefore, you can only change the observed data and successfully refit the model if the new observations have the same shape as the original ones (with x and posterior predictive the situation is different, but posterior sampling has this limitation). Using Data for y and m is adding this arguable new capability to the model. In v4 however, shapes will still be symbolic when compiling, so using data will allow to refit the model on any new/different data, for example for cross validation.

@review-notebook-app
Copy link

review-notebook-app bot commented May 22, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-05-22T10:48:50Z
----------------------------------------------------------------

We should avoid using "sanity check". Here are some alternatives: https://gist.github.com/seanmhanson/fe370c2d8bd2b3228680e38899baf5cc


drbenvincent commented on 2021-05-22T11:41:33Z
----------------------------------------------------------------

Will change to "double check"

@review-notebook-app
Copy link

review-notebook-app bot commented May 22, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-05-22T10:48:51Z
----------------------------------------------------------------

I would overlay the distributions instead. I think something like this will work, maybe will need some extra work on colors

ax = az.plot_posterior(..., point_estimate=None, hdi_prob="hide", label="Total...")
az.plot_posterior(...,  ax=ax)
ax.legend();

drbenvincent commented on 2021-05-22T12:08:59Z
----------------------------------------------------------------

I've given this a go. It was throwing an error providing the label kwarg, but have explained colours of curves.

OriolAbril commented on 2021-05-22T12:38:54Z
----------------------------------------------------------------

This is either an issue with ArviZ or a documentation error. According to the docstring **kwargs should be passed to plt.plot. I have opened https://github.com/arviz-devs/arviz/issues/1703

Copy link
Contributor Author

Will change to "double check"


View entire conversation on ReviewNB

Copy link
Contributor Author

I've given this a go. It was throwing an error providing the label kwarg, but have explained colours of curves.


View entire conversation on ReviewNB

Copy link
Member

This is either an issue with ArviZ or a documentation error. According to the docstring **kwargs should be passed to plt.plot. I have opened https://github.com/arviz-devs/arviz/issues/1703


View entire conversation on ReviewNB

@OriolAbril OriolAbril merged commit 2011be1 into pymc-devs:main Jun 6, 2021
@drbenvincent drbenvincent deleted the mediation branch June 7, 2021 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants