Skip to content

propose new ConstrainedUniform distribution #32

Open
@drbenvincent

Description

@drbenvincent

Following the discussion #5066 (with input from @aseyboldt), I propose a new ConstrainedUniform distribution. An ideal use case for this is as a distribution for cutpoints in an ordinal regression.

The idea is that the distribution would be a vector of N>2 uniformly distributed values where the first entry is constrained to take on a given min value and the final entry is constrained to take on a given max value. The first value is hard constrained to take on a particular value, and the final value is also constrained to take on a particular value - because of the use of the Dirichlet (which sums to 1). Use of the cumsum was also found to be necessary to enforce ordering and avoid divergences in sampling.

A ConstrainedUniform(N) would have N-2 degrees of freedom.

Minimum working example

The new distribution would be along these lines

def constrainedUniform(N, min=0, max=1):
    return pm.Deterministic('theta',
                             at.concatenate([
                                 np.ones(1)*min,
                                 at.extra_ops.cumsum(pm.Dirichlet("theta_unknown", a=np.ones(N-2))) (min+(max-min))
                             ])
                           )

If you had the following observations

y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3,
       3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6])

K = len(np.unique(y))
print(f"K = {K} categories")

Then you could have a pleasingly concise ordinal regression model

with pm.Model() as model:
    theta = constrainedUniform(K)
    mu = pm.Normal('mu', mu=K/2, sigma=K)
    sigma = pm.HalfNormal("sigma", 1)
    pm.OrderedProbit("y_obs", cutpoints=theta, eta=mu, sigma=sigma, observed=y)

Note that this amounts to a very simple 'intercept only' model, where mu is this intercept. There are no predictor variables in this example

Implementation issues

When converting that function constrainedUniform into a PyMC distribution, then at the very least, you'd have to consider:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions