# Bayesian workflow¶

This tutorial highlights some ideas from https://arxiv.org/abs/1709.01449, including:

• Making a flip book of the prior predictions

• posterior predictive checks

## Model¶

We again consider the sine model with gaussian measurement errors.

$y = A_1 \sin\left(2 \pi \left(\frac{t}{P_1} + t_1\right)\right) + B + \epsilon$

where $$\epsilon \sim \mathrm{Normal}(0, \sigma)$$

:

import numpy as np
from numpy import pi, sin

def sine_model1(t, B, A1, P1, t1):
return A1 * sin((t / P1 + t1) * 2 * pi) + B



The model has four unknown parameters per component:

• the signal offset $$B$$

• the amplitude $$A$$

• the period $$P$$

• the time offset $$t_0$$

## Generating data¶

Lets generate some data following this model:

:

np.random.seed(42)

n_data = 50

# time of observations
t = np.random.uniform(0, 5, size=n_data)
# measurement values
yerr = 1.0
y = np.random.normal(sine_model1(t, B=1.0, A1=0.9, P1=3, t1=0), yerr)

y += 3  # add outlier



## Visualise the data¶

Lets plot the data first to see what is going on:

:

%matplotlib inline
import matplotlib.pyplot as plt

plt.figure()
plt.xlabel('x')
plt.ylabel('y')
plt.errorbar(x=t, y=y, yerr=yerr,
marker='o', ls=' ', color='orange')


:

<ErrorbarContainer object of 3 artists> A beautiful noisy data set, with some hints of a modulation.

Given this data set, we should come up with a sensible model.

:

parameters = ['B', 'A1', 'P1', 't1']

def prior_transform(cube):
# the argument, cube, consists of values from 0 to 1
# we have to convert them to physical scales

params = cube.copy()
# let background level go from -10 to +10
params = cube * 20 - 10
# let amplitude go from 0.1 to 100
params = 10**(cube * 3 - 1)
# let period go from 0.3 to 30
params = 10**(cube * 2)
# let time go from 0 to 1
params = cube
return params

def log_likelihood(params):
# unpack the current parameters:
B, A1, P1, t1 = params

# compute for each x point, where it should lie in y
y_model = sine_model1(t, B=B, A1=A1, P1=P1, t1=t1)
# compute likelihood
loglike = -0.5 * (((y_model - y) / yerr)**2).sum()

return loglike



Lets take a few samples from the prior and look if they look anything like data we would expect.

:

for i in range(8):
plt.figure()
cube = np.random.uniform(size=len(parameters))
params = prior_transform(cube)
B, A1, P1, t1 = params
y_model = sine_model1(t, B=B, A1=A1, P1=P1, t1=t1)
y_sim = np.random.normal(y_model, yerr)

plt.xlabel('x')
plt.ylabel('y')
plt.errorbar(x=t, y=y_sim, yerr=yerr,
marker='o', ls=' ')        That looks quite OK actually – the y-ranges are of the right magnitude, the periods are sometimes short, sometimes long.

## Make posteriors¶

:

import ultranest

sampler = ultranest.ReactiveNestedSampler(parameters, log_likelihood, prior_transform)
result = sampler.run()


[ultranest] Sampling 400 live points from prior ...

Z=-40.5(29.83%) | Like=-33.63..-27.78 [-33.8121..-32.7049] | it/evals=3406/237562 eff=1.4361% N=400

/home/user/.local/lib/python3.6/site-packages/ultranest/integrator.py:1243: UserWarning: Sampling from region seems inefficient. You can try increasing nlive, frac_remain, dlogz, dKL, decrease min_ess). [0/40 accepted, it=2500]
" [%d/%d accepted, it=%d]" % (accepted.sum(), ndraw, nit))

[ultranest] Explored until L=-3e+01
[ultranest] Likelihood function evaluations: 274592
[ultranest]   logZ = -39.02 +- 0.09239
[ultranest] Effective samples strategy satisfied (ESS = 3394.6, need >400)
[ultranest] Posterior uncertainty strategy is satisfied (KL: 0.45+-0.05 nat, need <0.50 nat)
[ultranest] Evidency uncertainty strategy is satisfied (dlogz=0.18, need <0.5)
[ultranest]   logZ error budget: single: 0.14 bs:0.09 tail:0.01 total:0.09 required:<0.50
[ultranest] done iterating.


## Posterior predictive checks¶

The idea of posterior predictive checks (PPC) is to generate new data from each posterior sample, and then verify if they look like the data.

This is a bit circular, but if the model is too rigid, then some data points cannot be reproduced.

:

from ultranest.plot import PredictionBand

t_range = np.linspace(0, 5, 100)
band = PredictionBand(t_range)

plt.figure()

for params in result['samples']:
B, A1, P1, t1 = params
y_model = sine_model1(t_range, B=B, A1=A1, P1=P1, t1=t1)
y_sim = np.random.normal(y_model, yerr)

band.line(color='g')

plt.scatter(t, y, marker='o')
plt.xlabel('x')
plt.ylabel('y'); We nicely see that most data points follow the PPC 99.5% probability range (light green).

Except for one point (near x=2, y=4). This is the outlier I snuck in when generating the data.

At this point we could adjust the model.

## Plot the parameter posterior probability distribution¶

A classic corner plot:

:

from ultranest.plot import cornerplot
cornerplot(result) 