Probing the AGN geometry
with Bayesian techniques

Santiago de Chile, May 2015

Johannes Buchner (MPE$\rightarrow$PUC)

in collaboration with A. Georgakakis, K. Nandra, L. Hsu, C. Rangel, M. Brightman, A. Merloni and M. Salvato

Buchner et al. 2014 - arxiv:1402.0004

About me, Johannes Buchner

Studied Astronomy & Computer Science in Vienna
- developed MCMC software for stellar pulsation analysis
Master: Computer Science / Machine Learning in NZ, at a Radio Astronomy institute
- developed PyMultiNest -- Python nested sampling package
Doctorate: Astronomy @ Max Planck institute for extr.Physics
1. Bayesian X-ray spectral analysis with Bayesian model comparison of physically motivated models (today)
2. Population analysis of Active Galaxies (MCMC/Stan)
3. published a nested sampling paper in Statistics&Computing (Springer)

Astronomical problem

Black holes in the centers of all massive galaxies
matter falls in, spirals, heats, creates radiation. drawing
$E = M \cdot c^2$
luminosity $\rightarrow$ mass inflow

Observables

X-ray telescopes produce spectra via Poisson process. $$\cal{L}=\prod_i \lambda_i^{k_i} \cdot \exp\{-\lambda_i\}/k_i!$$

Example - Source 179 z=0.605, 2485 counts

powerlaw

Spectral models

Bayesian inference

$$ p(A|B) p(B) = p(B|A) p(A) $$

$$ p(\theta|D,M) = {p(\theta|M)p(D|\theta,M)\over p(D|\theta,M)} $$ parameter posterior probability distributions. assume $\int p(\theta|D,M)d\theta = 1 \rightarrow \int p(\theta|M)p(D|\theta,M)=p(D|\theta,M)=Z $ closed world assumption within defined parameter space of model M.

$$p(M|D)={p(M)p(D|M)\over p(D)}$$ assume $\sum p(M|D)=1 \rightarrow p(D)=\sum p(D|M)p(M)$ closed world assumption among models.

Bayesian inference

Nice properties:

can distinguish between physical effects (models) and constrain physical parameters consistently
models with large predictive diversity are punished
(Py)MultiNest makes it easy to compute Z and posterior distribution for up to 40 parameters, for arbitrary models
now becoming more common practice in Exoplanet searches, Cosmology, etc.

Parameters: of torus+scattering

Normalisation ~ luminosity
Column density $N_H={10}^{20-26}\text{cm}^{-2}$
Soft scattering normalisation: $f_{scat} < 10\%$
Photon index: $\Gamma$
Redshift

Parameters: of torus+scattering

Uniform: log Normalisation ~ log Luminosity
Uniform: Column density $\log N_H/\text{cm}^{-2} = {20-26}$
Uniform: Soft scattering normalisation: $\log f_{scat}={10}^{-10 ... -1}$
Photon index: informed, from nearby objects, $\Gamma \sim 1.95 \pm 0.15$ (Nandra&Pounds 1994)
Redshift: informed probability distribution, from other observations.

unobscured powerlaw

$\log Z = -64.3, \log N_H=20$

absorbed with tiny blob in the LOS

$\log Z = -12.7, \log N_H=22.28\pm 0.04$

torus+scattering

$\log Z = -4.7, \log N_H=22.45\pm 0.05$

torus+pexmon+scattering

$\log Z = 0.0, \log N_H=22.44\pm 0.05$

wabs+pexmon+scattering

$\log Z = -0.3, \log N_H=22.49\pm 0.05$

Model comparison results

Physically motivated models

Models to compare

Model comparison results

Model comparison

Can distinguish some models, others are equally probable
Have made use of
- local AGN information $\leftrightarrow$ $\Gamma$
- other $\lambda$ information $\leftrightarrow$ $z$
Have derived parameter distributions and $Z$ simultaneously
incorporating uncertainty in $\Gamma$, $z$
Can tell that it does not know

How can geometries be distinguished?

How to increase strength of data?
Combine many objects
Traditionally: stacking
- lose information in averaging
- averaging of different redshifts?
Here, multiply Z to combine information $$Z_1 = \prod_\mu Z_1^\mu$$

Results for the full CDFS sample

Bootstrapping

Conclusions

Possible geometries:
- Absorbed powerlaw necessary
- Soft scattering powerlaw detected
- Additional Compton-reflection detected
- Torus model preferred (also in CT AGN)
first use of nested sampling for model comparison & parameter estimation in X-ray astronomy
Buchner et al. 2014 - arxiv:1402.0004
coming next: population analysis of the $L$, $N_H$ distribution of Active Galactic Nuclei with non-parametric Bayesian analysis.

Nested Sampling

draw randomly uniformly 200 points
always remove least likely point, replace with a new draw of higher likelihood

converges to maximum likelihood, stops when flat

how to draw new points efficiently?
MultiNest does it via clustering and ellipses

Nested Sampling with MultiNest

explores the problem in

high dimensions (3-20)
handles multiple maxima
handles peculiar shapes
runs efficiently to convergence
(typically 10000-40000 points)

measures and describes shapes (like MCMC)

model inadequacy

Have cool tools now, but:

Is the model right?
Where is the model wrong?
Systematic effects?
Discover new physics beyond the model

Common route: residuals

New idea: Q-Q plot (no binning)

(Appendix 1)

good fit if straight line

Q-Q plot primer

Generate ideas for new models

Summary

(see 5.1, Appendix 3)
Parameter estimation:
explore multiple maxima
general solution with nested sampling
Model comparison:
Likelihood ratio is less effective than Z ratios
(see 5.2, Appendix 2)

computed by nested sampling; has right interpretation
(see 5.3, Appendix 1)
Model discovery:
Q-Q plots + model comparison

arxiv:1402.0004 github.com/JohannesBuchner/BXA

Likelihood ratio tests / F-test

only if special case
not at borders (feature detection)
only if $n\rightarrow\infty$
do not compute it, interpretation is wrong; critisize papers mis-interpreting the statistic

Pragmatic viewpoint

distinguish two models via data can use any statistic

does not have to be probabilistic

can be counts in the 6keV bin
determine discriminating threshold via simulations
- false association rate ($B\rightarrow A$, $A\rightarrow B$)
- correct association rate ($A\rightarrow A$, $B\rightarrow B$)

Comparison $\hat{L}$ vs. $Z$

red: falsely choose powerlaw, for wabs input

red: falsely choose wabs, for powerlaw input

(Appendix 2)

$Z$ more effective than $\hat{L}$

what is $Z$? Integral of likelihood / average

Bayesian evidence $$ {P(A|D)\over P(B|D)} = {P(A)\over P(B)} \times {Z_A\over Z_B} $$

interpretation of Z-ratio under flat priors:

prob. that this model is the right one, rather than the other.
A, B or equal

How to compute integral?

Connecting

Need to connect C-stat calculation with algorithm

BXA: Bayesian X-ray Analysis

MultiNest for Sherpa / (Py)Xspec
installed on ds42 under
```
/utils/bxa
```
documentation: github.com/JohannesBuchner/BXA

Analysis

Likelihood value evaluated "everywhere"

ML analysis: find confidence intervals
- defined via: how often the estimator (maximum) gives the right answer
- different for different estimators - property of the method
credible intervals
- defined via: prob. that the true value is inside this range rather than outside is x%.

results coincide for some choice of prior (usually "flat").

How?

posterior "chain" from MCMC/nested sampling: representation through point density

norm	$N_H$	$z$
-4.1	22.4	2.3
-4.3	22.45	2.4
-4.3	22.38	2.5
...	...

just make histogram of 1/2 columns

contains all correlations

Error propagation: example

norm	$N_H$	$z$
-4.1	22.4	2.3
-4.3	22.45	2.4
-4.3	22.38	2.5
...	...

for every parameter vector
(norm, $N_H$, $z$, ...), set the model

set $N_H$ = 0
compute intrinsic flux
compute luminosity
using $z$ and flux

incorporates uncertainty in $z$ and the parameters!

just do your calculation with every value instead of one

model inadequacy

Have cool tools now, but:

Is the model right?
Where is the model wrong?
Systematic effects?
Discover new physics beyond the model

Common route: residuals

New idea: Q-Q plot (no binning)

(Appendix 1)

good fit if straight line

Q-Q plot primer

Generate ideas for new models

Summary

(see 5.1, Appendix 3)
Parameter estimation:
explore multiple maxima
general solution with nested sampling
Model comparison:
Likelihood ratio is less effective than Z ratios
(see 5.2, Appendix 2)

computed by nested sampling; has right interpretation
(see 5.3, Appendix 1)
Model discovery:
Q-Q plots + model comparison

arxiv:1402.0004 github.com/JohannesBuchner/BXA

Pragmatic viewpoint

distinguish two models via data can use any statistic

does not have to be probabilistic

can be counts in the 6keV bin
determine discriminating threshold via simulations
- false association rate ($B\rightarrow A$, $A\rightarrow B$)
- correct association rate ($A\rightarrow A$, $B\rightarrow B$)

Comparison $\hat{L}$ vs. $Z$

red: falsely choose powerlaw, for wabs input

red: falsely choose wabs, for powerlaw input

(Appendix 2)

$Z$ more effective than $\hat{L}$

what is $Z$? Integral of likelihood / average

Bayesian evidence $$ {P(A|D)\over P(B|D)} = {P(A)\over P(B)} \times {Z_A\over Z_B} $$

interpretation of Z-ratio under flat priors:

prob. that this model is the right one, rather than the other.
A, B or equal

Probing the AGN geometry with Bayesian techniques

Johannes Buchner (MPE$\rightarrow$PUC)

About me, Johannes Buchner

Astronomical problem

Observables

Example - Source 179 z=0.605, 2485 counts

powerlaw

Spectral models

Bayesian inference

Bayesian inference

Parameters: of torus+scattering

Parameters: of torus+scattering

unobscured powerlaw

absorbed with tiny blob in the LOS

torus+scattering

torus+pexmon+scattering

wabs+pexmon+scattering

Model comparison results

Physically motivated models

Models to compare

Model comparison results

Model comparison

How can geometries be distinguished?

Results for the full CDFS sample

Conclusions

Nested Sampling

Nested Sampling with MultiNest

model inadequacy

Common route: residuals

Q-Q plot primer

Generate ideas for new models

Summary

Likelihood ratio tests / F-test

Pragmatic viewpoint

Comparison $\hat{L}$ vs. $Z$

$Z$ more effective than $\hat{L}$

Connecting

BXA: Bayesian X-ray Analysis

Analysis

How?

Error propagation: example

model inadequacy

Common route: residuals

Q-Q plot primer

Generate ideas for new models

Summary

Pragmatic viewpoint

Comparison $\hat{L}$ vs. $Z$

$Z$ more effective than $\hat{L}$

Probing the AGN geometry
with Bayesian techniques