Description
If I have a code file with the following:
# non_reproducible.py
import pymc3 as pm
import theano.tensor as tt
with pm.Model() as model:
pm.Normal('a')
pm.Normal('b')
pm.Normal('c')
pm.Normal('d')
pp = pm.sample_prior_predictive(samples=1, model=model,
random_seed=100)
print({k:pp[k] for k in sorted(pp)})
and run it multiple times, I get different results. (In the example below, values for a
and c
are swapped.)
python non_reproducible.py
# {'c': array([-1.74976547]), 'b': array([0.3426804]), 'a': array([1.1530358]), 'd': array([-0.25243604])}
python non_reproducible.py
# {'a': array([-1.74976547]), 'b': array([0.3426804]), 'c': array([1.1530358]), 'd': array([-0.25243604])}
In my opinion this defeats the whole purpose of being able to set the random seed.
(Note: it is essential to run the code in different sessions python sessions to see the effect.)
The reason is that the function uses set
to get a unique set of variable names, and the random draws depend on the (undefined) ordering of the returned set.
A simple fix might be to sort the variables names before sampling, as I've done here. This has the side-effect of making the result independent of the order in which variables are specified, which I think is nice.
I'm not proposing my fix as a PR because figuring out and testing consistency with other functions goes beyond my current needs. For example, sample_posterior_predictive
is (I think) reproducible given a random seed, because it just takes the list of variable names. But it likely does depend on the order in which variables are specified.
Versions and main components
- PyMC3 Version: master (27 Feb 2021)
- Theano Version: Theano-PyMC 1.1.2
- Python Version: 3.8.6
- Operating system: OpenSUSE LEAP 15.2
- How did you install PyMC3: pip (dependencies with conda)