Specifying priors

This tutorial demonstrates how to specify parameter priors, including:

  • uniform and log-uniform distributions

  • gaussian and more complicated distributions

  • multi-dimensional priors (not factorized)

  • non-analytic priors

  • priors on fractions

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt

Cumulative prior distributions

Any 1-dimensional probability distribution is normalised so that its integral is 1. That is, the cumulative distribution goes from 0 to 1. For example:

gaussdistribution = scipy.stats.norm(2, 0.3)
uniformdistribution = scipy.stats.uniform(3.5, 1.2)
x = np.linspace(0, 5, 400)
plt.plot(x, gaussdistribution.pdf(x), '--', label='density (Gauss)')
plt.plot(x, gaussdistribution.cdf(x), label='cumulative (Gauss)')
plt.plot(x, uniformdistribution.pdf(x), '--', label='density (uniform)')
plt.plot(x, uniformdistribution.cdf(x), label='cumulative (uniform)')
plt.xlabel('Model parameter x')

Transforming from the unit interval

We invert the cumulative probability distribution mapping quantiles (0…1) to the corresponding model parameter value.

Lets start with the uniform distribution:

def transform_1d_uniform(quantile):
    lower_bound = 3.5
    width = 1.2
    return quantile * width + lower_bound

Scipy provides the inverse cumulative probability distributions:

def transform_1d(quantile):
    return gaussdistribution.ppf(quantile)

UltraNest samples from the unit interval to obtain prior samples. Lets try drawing a few examples from our function:

uniform_samples = transform_1d_uniform(np.random.uniform(0, 1, size=100000))
gauss_samples = transform_1d(np.random.uniform(0, 1, size=100000))
plt.hist(uniform_samples, bins=20, histtype='step', density=True, label='Uniform')
plt.hist(gauss_samples, bins=100, histtype='step', density=True, label='Gauss')
plt.xlabel('Model parameter x')

Beautiful! We obtained nice samples that follow the prior distribution.

Specifying priors

Lets specify a prior for UltraNest with multiple parameters:

  • a uniform distribution from 3.5 to 4.7

  • a log-uniform distribution from 0.01 to 100

  • a gaussian distribution around 2.0 +- 0.3

# out transform function will receive one quantile corresponding to each of the three parameter
def transform(quantile_cube):
    # prepare the output array, which has the same shape
    transformed_parameters = np.empty_like(quantile_cube)
    # first parameter: a uniform distribution
    transformed_parameters[0] = 3.5 + 1.2 * quantile_cube[0]
    # second parameter: a log-uniform distribution
    transformed_parameters[1] = 10**(-2 + 4 * quantile_cube[1])
    # third parameter: Gaussian
    transformed_parameters[2] = mydistribution.ppf(quantile_cube[2])

    return transformed_parameters

Some recommendations:

  • scipy.stats provides many 1-d distributions that can be used like this.

  • avoid building scipy.stats objects in the transform, because this is slow – build them outside first, then only invoke the .ppf method in the transform.

  • If you are looking for a distribution that is not implemented yet, try to follow a random number generator recipe (see the Dirichlet prior for an example, below).

Dependent priors

Incorporating covariances

In some cases, a previous experiment gives informative priors which we want to incorporate, and they may be inter-dependent. For example, consider a two-dimensional gaussian prior distribution.

means = np.array([2, 3])
cov = np.array([[1, 0.6], [0.6, 0.4]])
rv = scipy.stats.multivariate_normal(means, cov)
x, y = np.linspace(-1, 5, 400), np.linspace(1.5, 5, 400)
X, Y = np.meshgrid(x, y)
Z = rv.pdf(np.transpose([X.flatten(), Y.flatten()])).reshape(X.shape)
plt.title('Correlated prior')
plt.contourf(X, Y, Z, cmap='magma_r')
plt.xlabel('Parameter 1')
plt.ylabel('Parameter 2');

We recall:

  • Parameter 1 has a cumulative distribution

  • At each value of Parameter 1, Parameter 2 has a cumulative distribution.

  • We can thus specify a dependent distribution using the unit hypercube

a = np.linalg.inv(cov)
l, v = np.linalg.eigh(a)
rotation_matrix = np.dot(v, np.diag(1. / np.sqrt(l)))

def transform_correlated(quantiles):
    # sample a independent multivariate gaussian
    independent_gaussian = scipy.stats.norm.ppf(quantiles)
    # rotate and shift
    return means + np.einsum('ij,kj->ki', rotation_matrix, independent_gaussian)

Lets try sampling!

samples = transform_correlated(np.random.uniform(0, 1, size=(100, 2)))

plt.title('Correlated prior')
plt.contourf(X, Y, Z, cmap='magma_r')
plt.plot(samples[:,0], samples[:,1], 'o', mew=1, mfc='w', mec='k')
plt.xlabel('Parameter 1')
plt.ylabel('Parameter 2');

Conditional prior approach

Another approach is to sample the second parameter conditional on the first parameter, already transformed. This is akin to Gibbs sampling.

For an example, we have a first parameter with a Gaussian prior, and a second parameter, with a Gaussian prior centred around the first parameter’s value. Therefore, its value shifts with the first parameter:

gauss1 = scipy.stats.norm(2, 1)
gauss2 = scipy.stats.norm(0, 0.1)

def transform_correlated_gibbs(quantiles):
    parameters = np.empty_like(quantiles)
    # first parameter is independent
    parameters[:,0] = gauss1.ppf(quantiles[:,0])
    # second parameter depends on first parameter, here with a shift
    parameters[:,1] = parameters[:,0] + gauss2.ppf(quantiles[:,1])
    return parameters
samples = transform_correlated_gibbs(np.random.uniform(0, 1, size=(100, 2)))

plt.title('Gibbs prior')
plt.plot(samples[:,0], samples[:,1], 'o', mew=1, mfc='w', mec='k')
plt.xlabel('Parameter 1')
plt.ylabel('Parameter 2');

As you can see, we also achieve a correlated prior. However, this is different from the previous example.

Complicated constraints and rejection in the likelihood

In some situations, you may have more constraints than parameters, such as:

parameter_1_lower < parameter_1 < parameter_1_upper
parameter_2_lower < parameter_2 < parameter_2_upper
parameter_1 + parameter_2 < constant

In that case, move either the first two or the last constraint into the likelihood function, whichever option is more relaxed (i.e., causes fewer rejections). This is achieved by returning a very low likelihood (e.g., -1e100), when the constraint is not met.

It is beneficial for the sampler if you can add a slight slope towards the good region of the constraint. e.g., -1e100 * (1 + parameter_1 + parameter_2) or similar. This is because if you use the exact same constant, this is a likelihood plateau, and the live points have to be reduced until the plateau is traversed.

Non-analytic priors

Sometimes, the prior may not be easily invertable. For example, when it is given as posterior samples from a previous analysis. Lets say as a prior, we want a posterior from another experiment that looks like this:

posterior_samples = np.hstack((np.random.uniform(0, 3, 2000), np.random.normal(3, 0.2, 2000)))

plt.hist(posterior_samples, histtype='step', bins=100);

In this case, we can compute the cumulative distribution numerically and invert it. Lets try implementing this and sampling from it:

hist, bin_edges = np.histogram(posterior_samples, bins=100)
hist_cumulative = np.cumsum(hist / hist.sum())
bin_middle = (bin_edges[:-1] + bin_edges[1:]) / 2

def transform_histogram(quantile):
    return np.interp(quantile, hist_cumulative, bin_middle)

samples = transform_histogram(np.random.uniform(size=1000))
plt.hist(posterior_samples, histtype='step', bins=100, density=True);
plt.hist(samples, histtype='step', bins=100, density=True);

Fraction parameters

Some parameters such as fractions (or abundances) may be required to sum to 1. How can we specify such parameters?

One option is to use absolute numbers. For example, instead of specifying the total mass and mass fractions for each chemical element, parameterise the mass of each element. This avoids the <=1 constraint, and may be easier to infer. A drawback is that the prior ranges for each element mass may be wide.

The other option is to use the right distribution exactly made for this, which samples unbiased under the constraint (sum<=1): The Dirichlet distribution. Here we assume that the prior on the individual fraction is flat (flat Dirichlet distribution, \(\alpha=1\)).

def transform_dirichlet(quantiles):
    # https://en.wikipedia.org/wiki/Dirichlet_distribution#Random_number_generation
    # first inverse transform sample from Gamma(alpha=1,beta=1), which is Exponential(1)
    gamma_quantiles = -np.log(quantiles)
    # dirichlet variables
    return gamma_quantiles / gamma_quantiles.sum(axis=1).reshape((-1, 1))

Lets have a look at the samples, and whether the three fractions look uniform and sum up to 1:

samples = transform_dirichlet(np.random.uniform(0, 1, size=(400, 3)))

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.view_init(elev=10., azim=-30)
ax.plot(samples[:,0], samples[:,1], samples[:,2], 'x ');

The samples nicely lie on the plane where the sum is 1.

Further topics

Check out the rest of the documentation and the tutorials.

They illustrate how to: