Description
Reading #4108 and #4249, it seems that there was some back and forth to try to match the behavior of scipy.stats.hypergeom
with that of HyperGeometric
, specially to pass the default unit tests when given invalid parameters. Unfortunately the current solution misbehaves often with invalid parameters.
Here is an example:
pm.HyperGeometric.dist(N=3, k=3, n=4).logp(3).eval() # returns array(1.38629434)
st.hypergeom(M=3, n=3, N=4).logpmf(3) # returns nan
(n=4
is impossible since it must be smaller or equal than N
)
The initial PR draft had a bound
to deal with invalid parameters, but it was removed so that logp
would return the same values as the scipy counterpart (otherwise we would sometimes get pymc=-inf
vs scipy=nan
). Unfortunately, as shown in the example above, we cannot trust the pymc3 implementation to reliably return nan
for invalid parameters. Scipy seems to have some parameter validation method, but it is not clear to me if and how it is used by the logpmf
method:
Besides not being reliable, returning nan
also seems to deviate from the usual pymc behavior of getting -inf
for invalid configurations. In addition, the logcdf
method for the HyperGeometric (see #4331) will also be incompatible with results of the scipy counterpart, which actually gives wrong results for some configurations of impossible parameters:
st.hypergeom(M=3, n=3, N=4).logpmf(np.arange(5)) # returns array([nan, nan, nan, nan, nan])
st.hypergeom(M=3, n=3, N=4).logcdf(np.arange(5)) # returns array([nan, nan, nan, 0., 0.])
Given these issues, I suggest we depart from the scipy implementation and return to using bound
to replace impossible configurations with -inf
. I am willing to do a PR (or give it to @Harivallabha if he is interested). To overcome the original issue with the unit tests, we can change the check_logp
method in pymc3_matches_scipy
to replace scipy nan
with -inf
(perhaps with an optional argument, in case this conflicts with other distributions tests):
https://github.com/pymc-devs/pymc3/blob/9dbf9eabf3b18bf2529a8855c086e90249be0b5e/pymc3/tests/test_distributions.py#L543-L551
What do you think?