Model comparison

Model

We again consider the sine model with gaussian measurement errors.

\[y = A_1 \sin\left(2 \pi \left(\frac{t}{P_1} + t_1\right)\right) + B + \epsilon\]

where \(\epsilon \sim \mathrm{Normal}(0, \sigma)\)

We want to test if this is preferred over pure noise.

[1]:
import numpy as np
from numpy import pi, sin

def sine_model1(t, B, A1, P1, t1):
    return A1 * sin((t / P1 + t1) * 2 * pi) + B

def sine_model0(t, B):
    return B + t*0

The model has four unknown parameters per component:

  • the signal offset \(B\)

  • the amplitude \(A\)

  • the period \(P\)

  • the time offset \(t_0\)

Generating data

Lets generate some data following this model:

[2]:
np.random.seed(42)

n_data = 50

# time of observations
t = np.random.uniform(0, 5, size=n_data)
# measurement values
yerr = 1.0
y = np.random.normal(sine_model1(t, B=1.0, A1=0.9, P1=3, t1=0), yerr)

Visualise the data

Lets plot the data first to see what is going on:

[3]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.figure()
plt.xlabel('x')
plt.ylabel('y')
plt.errorbar(x=t, y=y, yerr=yerr,
             marker='o', ls=' ', color='orange')
t_range = np.linspace(0, 5, 1000)

_images/example-sine-modelcomparison_6_0.svg

A beautiful noisy data set, with some hints of a modulation.

Now the question is: what model parameters are allowed under these data?

First, we need to define the parameter ranges through a prior:

[4]:
parameters1 = ['B', 'A1', 'P1', 't1']

def prior_transform1(cube):
    # the argument, cube, consists of values from 0 to 1
    # we have to convert them to physical scales

    params = cube.copy()
    # let background level go from -10 to +10
    params[0] = cube[0] * 20 - 10
    # let amplitude go from 0.1 to 100
    params[1] = 10**(cube[1] * 3 - 1)
    # let period go from 1 to 100
    params[2] = 10**(cube[2] * 2)
    # let time go from 0 to 1
    params[3] = cube[3]
    return params

parameters0 = ['B']

def prior_transform0(cube):
    # the argument, cube, consists of values from 0 to 1
    # we have to convert them to physical scales

    params = cube.copy()
    # let background level go from -10 to +10
    params[0] = cube[0] * 20 - 10
    return params

Define the likelihood, which measures how far the data are from the model predictions. More precisely, how often the parameters would arise under the given parameters. We assume gaussian measurement errors of known size (yerr).

\[\chi^2 = \sum\left(\frac{m_i-y_i}{\sigma}\right)^2\]
\[\log \cal{L} = -\chi^2 / 2\]

where the model is the sine_model function from above at time \(t_i\).

[5]:
def log_likelihood1(params):
    # unpack the current parameters:
    B, A1, P1, t1 = params

    # compute for each x point, where it should lie in y
    y_model = sine_model1(t, B=B, A1=A1, P1=P1, t1=t1)
    # compute likelihood
    loglike = -0.5 * (((y_model - y) / yerr)**2).sum()

    return loglike

def log_likelihood0(params):
    B, = params

    y_model = sine_model0(t, B=B)
    # compute likelihood
    loglike = -0.5 * (((y_model - y) / yerr)**2).sum()

    return loglike

Solve the problem:

[6]:
import ultranest

sampler1 = ultranest.ReactiveNestedSampler(parameters1, log_likelihood1, prior_transform1)

sampler0 = ultranest.ReactiveNestedSampler(parameters0, log_likelihood0, prior_transform0)

[7]:
result1 = sampler1.run(min_num_live_points=400)
sampler1.print_results()
[ultranest] Sampling 400 live points from prior ...
[ultranest] Explored until L=-2e+01
[ultranest] Likelihood function evaluations: 153568
[ultranest]   logZ = -33.07 +- 0.09983
[ultranest] Effective samples strategy satisfied (ESS = 2566.1, need >400)
[ultranest] Posterior uncertainty strategy is satisfied (KL: 0.46+-0.06 nat, need <0.50 nat)
[ultranest] Evidency uncertainty strategy is satisfied (dlogz=0.10, need <0.5)
[ultranest]   logZ error budget: single: 0.16 bs:0.10 tail:0.01 total:0.10 required:<0.50
[ultranest] done iterating.

logZ = -33.043 +- 0.219
  single instance: logZ = -33.043 +- 0.157
  bootstrapped   : logZ = -33.074 +- 0.219
  tail           : logZ = +- 0.010
insert order U test : converged: True correlation: inf iterations

    B                   : -7.11 │ ▁      ▁ ▁ ▁▁▁▁▁▁▇▃▁▁▁▁▁▁ ▁▁ ▁ ▁ ▁▁▁▁ │9.74      1.02 +- 0.33
    A1                  : 0.10  │▁▂▅▇▅▂▁▁▁▁▁▁▁  ▁ ▁▁▁▁ ▁▁     ▁   ▁▁▁▁▁ │8.52      0.88 +- 0.36
    P1                  : 1.0   │▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁  ▁ ▁│100.0     4.0 +- 7.2
    t1                  : 0.00  │▇▇▆▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▃▄▆▇│1.00      0.45 +- 0.44

[8]:
result0 = sampler0.run(min_num_live_points=400)
sampler0.print_results()
[ultranest] Sampling 400 live points from prior ...
[ultranest] Explored until L=-3e+01
[ultranest] Likelihood function evaluations: 3375
[ultranest]   logZ = -35.69 +- 0.06188
[ultranest] Effective samples strategy satisfied (ESS = 1277.1, need >400)
[ultranest] Posterior uncertainty strategy is satisfied (KL: 0.46+-0.07 nat, need <0.50 nat)
[ultranest] Evidency uncertainty strategy is satisfied (dlogz=0.07, need <0.5)
[ultranest]   logZ error budget: single: 0.09 bs:0.06 tail:0.04 total:0.07 required:<0.50
[ultranest] done iterating.

logZ = -35.701 +- 0.162
  single instance: logZ = -35.701 +- 0.093
  bootstrapped   : logZ = -35.688 +- 0.158
  tail           : logZ = +- 0.039
insert order U test : converged: True correlation: inf iterations

    B                   : 0.61  │ ▁▁▁▁▁▁▁▂▂▃▄▅▅▅▆▇▇▇▅▅▅▄▃▂▁▁▁▁▁▁▁ ▁   ▁ │1.85      1.15 +- 0.14

Plot the parameter posterior probability distribution

A classic corner plot of the parameter pairs and the marginal distributions:

[9]:
from ultranest.plot import cornerplot
cornerplot(result1)
[9]:
_images/example-sine-modelcomparison_16_0.svg
_images/example-sine-modelcomparison_16_1.svg
[10]:
cornerplot(result0)
[10]:
_images/example-sine-modelcomparison_17_0.svg
_images/example-sine-modelcomparison_17_1.svg

If you want, you can also play with the posterior as a pandas frame:

[11]:
import pandas as pd
df = pd.DataFrame(data=result1['samples'], columns=result1['paramnames'])
df.describe()
[11]:
B A1 P1 t1
count 7192.000000 7192.000000 7192.000000 7192.000000
mean 1.024002 0.876746 4.048194 0.445307
std 0.329628 0.358254 7.162398 0.436872
min -6.339979 0.101254 1.059841 0.000036
25% 0.906667 0.702384 2.900260 0.045471
50% 1.010382 0.872435 3.049651 0.127899
75% 1.120236 1.025389 3.207816 0.947903
max 8.977825 8.122305 99.868593 0.999924

Plot the fit:

To evaluate whether the results make any sense, we want to look whether the fitted function goes through the data points.

[12]:
plt.figure()
plt.title("1-sine fit")
plt.xlabel('x')
plt.ylabel('y')
plt.errorbar(x=t, y=y, yerr=yerr,
             marker='o', ls=' ', color='orange')


t_grid = np.linspace(0, 5, 400)

from ultranest.plot import PredictionBand
band = PredictionBand(t_grid)

# go through the solutions
for B, A1, P1, t1 in sampler1.results['samples']:
    # compute for each time the y value
    band.add(sine_model1(t_grid, B=B, A1=A1, P1=P1, t1=t1))

band.line(color='k')
# add 1 sigma quantile
band.shade(color='k', alpha=0.3)
# add wider quantile (0.01 .. 0.99)
band.shade(q=0.49, color='gray', alpha=0.2)


[12]:
<matplotlib.collections.PolyCollection at 0x7f422f708170>
_images/example-sine-modelcomparison_22_1.svg

Model comparison methods

We now want to know:

Is the model with 2 components better than the model with one component?

What do we mean by “better” (“it fits better”, “the component is significant”)?

  1. Which model is better at predicting data it has not seen yet?

  2. Which model is more probably the true one, given this data, these models, and their parameter spaces?

  3. Which model is simplest, but complex enough to capture the information complexity of the data?

Bayesian model comparison

Here we will focus on b, and apply Bayesian model comparison.

For simplicity, we will assume equal a-prior model probabilities.

The Bayes factor is:

[13]:
K = np.exp(result1['logz'] - result0['logz'])
print("K = %.2f" % K)
print("The 1-sine model is %.2f times more probable than the no-signal model" % K)
print("assuming the models are equally probable a priori.")
K = 14.28
The 1-sine model is 14.28 times more probable than the no-signal model
assuming the models are equally probable a priori.

N.B.: Bayes factors are influenced by parameter and model priors. It is a good idea to vary them and see how sensitive the result is.

For making decisions, thresholds are needed. They can be calibrated to desired low false decisions rates with simulations (generate data under the simpler model, look at K distribution).

Calibrating Bayes factor thresholds

Lets generate some data sets under the null hypothesis (noise-only model) and see how often we would get a large Bayes factor. For this, we need to fit with both models.

[14]:
import logging
logging.getLogger('ultranest').setLevel(logging.FATAL)
[15]:
K_simulated = []

import logging
logging.getLogger('ultranest').handlers[-1].setLevel(logging.FATAL)

# go through 100 plausible parameters
for B in sampler0.results['samples'][:10]:
    # generate new data
    y = np.random.normal(sine_model0(t, B=1.0), yerr)

    # analyse with sine model
    sampler1 = ultranest.ReactiveNestedSampler(parameters1, log_likelihood1, prior_transform1)
    Z1 = sampler1.run(viz_callback=False)['logz']
    # analyse with noise-only model
    sampler0 = ultranest.ReactiveNestedSampler(parameters0, log_likelihood0, prior_transform0)
    Z0 = sampler0.run(viz_callback=False)['logz']
    # store Bayes factor
    K_here = Z1 - Z0
    K_simulated.append(K_here)
    print()
    print("Bayes factor: %.2f" % np.exp(K_here))

Z=-35.1(90.67%) | Like=-29.61..-28.10 [-29.6114..-29.6110]*| it/evals=3245/215621 eff=1.5078% N=400
/home/user/Downloads/UltraNest/ultranest/integrator.py:1903: UserWarning: Sampling from region seems inefficient (0/40 accepted in iteration 2500). To improve efficiency, modify the transformation so that the current live points are ellipsoidal, or use a stepsampler, or set frac_remain to a lower number (e.g., 0.5) to terminate earlier.
  u, v, logl, nc, quality = self._refill_samples(Lmin, ndraw, nit)
Z=-33.7(96.25%) | Like=-29.66..-29.66 [-29.6594..-29.6594]*| it/evals=2920/3424 eff=96.5608% N=400
Bayes factor: 0.27
Z=-34.2(96.74%) | Like=-30.14..-30.14 [-30.1385..-30.1385]*| it/evals=2970/3492 eff=96.0543% N=400
Bayes factor: 0.45
Z=-30.3(96.09%) | Like=-26.28..-26.28 [-26.2831..-26.2831]*| it/evals=2880/3410 eff=95.6811% N=400
Bayes factor: 0.25
Z=-33.7(96.04%) | Like=-29.62..-29.62 [-29.6243..-29.6243]*| it/evals=2920/3415 eff=96.8491% N=400
Bayes factor: 0.26
Z=-35.2(84.63%) | Like=-27.78..-26.61 [-27.7774..-27.7768]*| it/evals=3828/153481 eff=2.5006% N=400
/home/user/Downloads/UltraNest/ultranest/integrator.py:1903: UserWarning: Sampling from region seems inefficient (0/40 accepted in iteration 2500). To improve efficiency, modify the transformation so that the current live points are ellipsoidal, or use a stepsampler, or set frac_remain to a lower number (e.g., 0.5) to terminate earlier.
  u, v, logl, nc, quality = self._refill_samples(Lmin, ndraw, nit)
Z=-34.3(96.50%) | Like=-30.21..-30.21 [-30.2132..-30.2132]*| it/evals=2946/3456 eff=96.4005% N=400
Bayes factor: 0.44
Z=-26.1(91.43%) | Like=-19.74..-19.42 [-19.7385..-19.7376]*| it/evals=3547/161172 eff=2.2062% N=400
/home/user/Downloads/UltraNest/ultranest/integrator.py:1903: UserWarning: Sampling from region seems inefficient (0/40 accepted in iteration 2500). To improve efficiency, modify the transformation so that the current live points are ellipsoidal, or use a stepsampler, or set frac_remain to a lower number (e.g., 0.5) to terminate earlier.
  u, v, logl, nc, quality = self._refill_samples(Lmin, ndraw, nit)
Z=-25.8(95.36%) | Like=-21.65..-21.65 [-21.6545..-21.6545]*| it/evals=2880/3364 eff=97.1660% N=400
Bayes factor: 0.78
Z=-28.3(96.14%) | Like=-24.17..-24.17 [-24.1710..-24.1710]*| it/evals=2920/3414 eff=96.8812% N=400
Bayes factor: 0.28
Z=-31.2(70.66%) | Like=-21.41..-20.38 [-21.4062..-21.4052]*| it/evals=4440/207995 eff=2.1388% N=400
/home/user/Downloads/UltraNest/ultranest/integrator.py:1903: UserWarning: Sampling from region seems inefficient (0/40 accepted in iteration 2500). To improve efficiency, modify the transformation so that the current live points are ellipsoidal, or use a stepsampler, or set frac_remain to a lower number (e.g., 0.5) to terminate earlier.
  u, v, logl, nc, quality = self._refill_samples(Lmin, ndraw, nit)
Z=-32.5(96.39%) | Like=-28.54..-28.54 [-28.5450..-28.5450]*| it/evals=2880/3389 eff=96.3533% N=400
Bayes factor: 4.73
Z=-34.6(80.76%) | Like=-28.96..-27.77 [-28.9568..-28.9556]*| it/evals=2921/200786 eff=1.4577% N=400
/home/user/Downloads/UltraNest/ultranest/integrator.py:1903: UserWarning: Sampling from region seems inefficient (0/40 accepted in iteration 2500). To improve efficiency, modify the transformation so that the current live points are ellipsoidal, or use a stepsampler, or set frac_remain to a lower number (e.g., 0.5) to terminate earlier.
  u, v, logl, nc, quality = self._refill_samples(Lmin, ndraw, nit)
Z=-33.5(96.29%) | Like=-29.29..-29.29 [-29.2901..-29.2901]*| it/evals=2970/3467 eff=96.8373% N=400
Bayes factor: 0.36
Z=-31.6(52.41%) | Like=-23.95..-21.34 [-23.9522..-23.9509]*| it/evals=3451/207093 eff=1.6696% N=400
/home/user/Downloads/UltraNest/ultranest/integrator.py:1903: UserWarning: Sampling from region seems inefficient (0/40 accepted in iteration 2500). To improve efficiency, modify the transformation so that the current live points are ellipsoidal, or use a stepsampler, or set frac_remain to a lower number (e.g., 0.5) to terminate earlier.
  u, v, logl, nc, quality = self._refill_samples(Lmin, ndraw, nit)
Z=-31.2(96.31%) | Like=-27.18..-27.18 [-27.1839..-27.1839]*| it/evals=2920/3421 eff=96.6567% N=400
Bayes factor: 1.35
[16]:
plt.figure()
plt.hist(np.exp(K_simulated), histtype='step', label='From simulated noise data')
ylo, yhi = plt.ylim()
plt.vlines(K, ylo, yhi, label='From our real data')
plt.xscale('log')
plt.xlabel('Bayes factor')
plt.ylabel('Frequency')
plt.legend(loc='upper center');
_images/example-sine-modelcomparison_30_0.svg

If we run this a bit longer, we will fill in the simulation histogram better. But already now we can see:

We are using simulations to measure how often, by chance, we would see a Bayes factor higher than the one we observe. By building up a histogram, we can get a p-value, telling us our false decision rate for any Bayes factor threshold. Thus, we are putting a frequentist property on our Bayesian inference-based decision.

So I would say: Pure noise does not produce as high a Bayes factor as we see it in the real data.

Calibrating Bayes factor thresholds reduces the dependence on model priors and model parameter priors.

Further reading

[ ]: