Skip to content

Non-reproducible random draws with sample_prior_predictive and set random seed #4490

Closed
@alcrene

Description

@alcrene

If I have a code file with the following:

# non_reproducible.py
import pymc3 as pm
import theano.tensor as tt

with pm.Model() as model:
    pm.Normal('a')
    pm.Normal('b')
    pm.Normal('c')
    pm.Normal('d')

pp = pm.sample_prior_predictive(samples=1, model=model,
                                random_seed=100)
print({k:pp[k] for k in sorted(pp)})

and run it multiple times, I get different results. (In the example below, values for a and c are swapped.)

python non_reproducible.py
# {'c': array([-1.74976547]), 'b': array([0.3426804]), 'a': array([1.1530358]), 'd': array([-0.25243604])}
python non_reproducible.py
# {'a': array([-1.74976547]), 'b': array([0.3426804]), 'c': array([1.1530358]), 'd': array([-0.25243604])}

In my opinion this defeats the whole purpose of being able to set the random seed.
(Note: it is essential to run the code in different sessions python sessions to see the effect.)

The reason is that the function uses set to get a unique set of variable names, and the random draws depend on the (undefined) ordering of the returned set.

A simple fix might be to sort the variables names before sampling, as I've done here. This has the side-effect of making the result independent of the order in which variables are specified, which I think is nice.

I'm not proposing my fix as a PR because figuring out and testing consistency with other functions goes beyond my current needs. For example, sample_posterior_predictive is (I think) reproducible given a random seed, because it just takes the list of variable names. But it likely does depend on the order in which variables are specified.

Versions and main components

  • PyMC3 Version: master (27 Feb 2021)
  • Theano Version: Theano-PyMC 1.1.2
  • Python Version: 3.8.6
  • Operating system: OpenSUSE LEAP 15.2
  • How did you install PyMC3: pip (dependencies with conda)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions