Probing the AGN geometry
with Bayesian techniques

Santiago de Chile, May 2015

Johannes Buchner (MPE$\rightarrow$PUC)

in collaboration with A. Georgakakis, K. Nandra, L. Hsu, C. Rangel, M. Brightman, A. Merloni and M. Salvato

Buchner et al. 2014 - arxiv:1402.0004

About me, Johannes Buchner

  • Studied Astronomy & Computer Science in Vienna
    • developed MCMC software for stellar pulsation analysis
  • Master: Computer Science / Machine Learning in NZ, at a Radio Astronomy institute
    • developed PyMultiNest -- Python nested sampling package
  • Doctorate: Astronomy @ Max Planck institute for extr.Physics
    1. Bayesian X-ray spectral analysis with Bayesian model comparison of physically motivated models (today)
    2. Population analysis of Active Galaxies (MCMC/Stan)
    3. published a nested sampling paper in Statistics&Computing (Springer)

Astronomical problem

  • Black holes in the centers of all massive galaxies
  • matter falls in, spirals, heats, creates radiation. drawing
  • $E = M \cdot c^2$
  • luminosity $\rightarrow$ mass inflow

Observables

X-ray telescopes produce spectra via Poisson process. $$\cal{L}=\prod_i \lambda_i^{k_i} \cdot \exp\{-\lambda_i\}/k_i!$$

Example - Source 179 z=0.605, 2485 counts

powerlaw


Spectral models

Bayesian inference

$$ p(A|B) p(B) = p(B|A) p(A) $$

$$ p(\theta|D,M) = {p(\theta|M)p(D|\theta,M)\over p(D|\theta,M)} $$ parameter posterior probability distributions. assume $\int p(\theta|D,M)d\theta = 1 \rightarrow \int p(\theta|M)p(D|\theta,M)=p(D|\theta,M)=Z $ closed world assumption within defined parameter space of model M.

$$p(M|D)={p(M)p(D|M)\over p(D)}$$ assume $\sum p(M|D)=1 \rightarrow p(D)=\sum p(D|M)p(M)$ closed world assumption among models.

Bayesian inference

Nice properties:
  • can distinguish between physical effects (models) and constrain physical parameters consistently
  • models with large predictive diversity are punished
  • (Py)MultiNest makes it easy to compute Z and posterior distribution for up to 40 parameters, for arbitrary models
  • now becoming more common practice in Exoplanet searches, Cosmology, etc.

Parameters: of torus+scattering

  • Normalisation ~ luminosity
  • Column density $N_H={10}^{20-26}\text{cm}^{-2}$
  • Soft scattering normalisation: $f_{scat} < 10\%$
  • Photon index: $\Gamma$
  • Redshift

Parameters: of torus+scattering

  • Uniform: log Normalisation ~ log Luminosity
  • Uniform: Column density $\log N_H/\text{cm}^{-2} = {20-26}$
  • Uniform: Soft scattering normalisation: $\log f_{scat}={10}^{-10 ... -1}$
  • Photon index: informed, from nearby objects, $\Gamma \sim 1.95 \pm 0.15$ (Nandra&Pounds 1994)
  • Redshift: informed probability distribution, from other observations.

unobscured powerlaw


absorbed with tiny blob in the LOS


torus+scattering


torus+pexmon+scattering


wabs+pexmon+scattering


Model comparison results

Physically motivated models

Models to compare

Model comparison results

Model comparison

  • Can distinguish some models, others are equally probable
  • Have made use of
    • local AGN information $\leftrightarrow$ $\Gamma$
    • other $\lambda$ information $\leftrightarrow$ $z$
  • Have derived parameter distributions and $Z$ simultaneously
  • incorporating uncertainty in $\Gamma$, $z$
  • Can tell that it does not know

How can geometries be distinguished?

  • How to increase strength of data?
  • Combine many objects
  • Traditionally: stacking
    • lose information in averaging
    • averaging of different redshifts?
  • Here, multiply Z to combine information $$Z_1 = \prod_\mu Z_1^\mu$$

Results for the full CDFS sample

  • Bootstrapping
  • Conclusions

    • Possible geometries:
      • Absorbed powerlaw necessary
      • Soft scattering powerlaw detected
      • Additional Compton-reflection detected
      • Torus model preferred (also in CT AGN)
    • first use of nested sampling for model comparison & parameter estimation in X-ray astronomy
    • Buchner et al. 2014 - arxiv:1402.0004
    • coming next: population analysis of the $L$, $N_H$ distribution of Active Galactic Nuclei with non-parametric Bayesian analysis.

    Nested Sampling

      draw randomly uniformly 200 points
      always remove least likely point, replace with a new draw of higher likelihood

      converges to maximum likelihood, stops when flat

      how to draw new points efficiently?
      MultiNest does it via clustering and ellipses

    Nested Sampling with MultiNest

    explores the problem in

    • high dimensions (3-20)
    • handles multiple maxima
    • handles peculiar shapes
    • runs efficiently to convergence
      (typically 10000-40000 points)

    measures and describes shapes (like MCMC)

    model inadequacy

    Have cool tools now, but:
    • Is the model right?
    • Where is the model wrong?
    • Systematic effects?
    • Discover new physics beyond the model

    Common route: residuals

    New idea: Q-Q plot (no binning)
    (Appendix 1)

    good fit if straight line

    Q-Q plot primer

    Generate ideas for new models

    Summary

    • (see 5.1, Appendix 3)
      Parameter estimation:
      explore multiple maxima
      general solution with nested sampling
    • Model comparison:
      Likelihood ratio is less effective than Z ratios
      (see 5.2, Appendix 2)

      computed by nested sampling; has right interpretation
    • (see 5.3, Appendix 1)
      Model discovery:
      Q-Q plots + model comparison

    arxiv:1402.0004 github.com/JohannesBuchner/BXA

    Likelihood ratio tests / F-test

    • only if special case
    • not at borders (feature detection)
    • only if $n\rightarrow\infty$
    • do not compute it, interpretation is wrong; critisize papers mis-interpreting the statistic

    Pragmatic viewpoint

    distinguish two models via data can use any statistic

    does not have to be probabilistic

    • can be counts in the 6keV bin
    • determine discriminating threshold via simulations
      • false association rate ($B\rightarrow A$, $A\rightarrow B$)
      • correct association rate ($A\rightarrow A$, $B\rightarrow B$)

    Comparison $\hat{L}$ vs. $Z$

    red: falsely choose powerlaw, for wabs input

    red: falsely choose wabs, for powerlaw input

    (Appendix 2)

    $Z$ more effective than $\hat{L}$

    what is $Z$? Integral of likelihood / average

    Bayesian evidence $$ {P(A|D)\over P(B|D)} = {P(A)\over P(B)} \times {Z_A\over Z_B} $$

    interpretation of Z-ratio under flat priors:

    • prob. that this model is the right one, rather than the other.
    • A, B or equal
    How to compute integral?

    Connecting

    Need to connect C-stat calculation with algorithm

    BXA: Bayesian X-ray Analysis

    Analysis

    Likelihood value evaluated "everywhere"

    • ML analysis: find confidence intervals
      • defined via: how often the estimator (maximum) gives the right answer
      • different for different estimators - property of the method
    • credible intervals
      • defined via: prob. that the true value is inside this range rather than outside is x%.

    results coincide for some choice of prior (usually "flat").

    How?

    posterior "chain" from MCMC/nested sampling: representation through point density
    norm$N_H$$z$
    -4.122.42.3
    -4.322.452.4
    -4.322.382.5
    ...... 

    just make histogram of 1/2 columns

    contains all correlations

    Error propagation: example

    norm$N_H$$z$
    -4.122.42.3
    -4.322.452.4
    -4.322.382.5
    ...... 
    for every parameter vector
    (norm, $N_H$, $z$, ...), set the model
    • set $N_H$ = 0
    • compute intrinsic flux
    • compute luminosity
      using $z$ and flux

    incorporates uncertainty in $z$ and the parameters!

    • just do your calculation with every value instead of one

    model inadequacy

    Have cool tools now, but:
    • Is the model right?
    • Where is the model wrong?
    • Systematic effects?
    • Discover new physics beyond the model

    Common route: residuals

    New idea: Q-Q plot (no binning)
    (Appendix 1)

    good fit if straight line

    Q-Q plot primer

    Generate ideas for new models

    Summary

    • (see 5.1, Appendix 3)
      Parameter estimation:
      explore multiple maxima
      general solution with nested sampling
    • Model comparison:
      Likelihood ratio is less effective than Z ratios
      (see 5.2, Appendix 2)

      computed by nested sampling; has right interpretation
    • (see 5.3, Appendix 1)
      Model discovery:
      Q-Q plots + model comparison

    arxiv:1402.0004 github.com/JohannesBuchner/BXA

    Pragmatic viewpoint

    distinguish two models via data can use any statistic

    does not have to be probabilistic

    • can be counts in the 6keV bin
    • determine discriminating threshold via simulations
      • false association rate ($B\rightarrow A$, $A\rightarrow B$)
      • correct association rate ($A\rightarrow A$, $B\rightarrow B$)

    Comparison $\hat{L}$ vs. $Z$

    red: falsely choose powerlaw, for wabs input

    red: falsely choose wabs, for powerlaw input

    (Appendix 2)

    $Z$ more effective than $\hat{L}$

    what is $Z$? Integral of likelihood / average

    Bayesian evidence $$ {P(A|D)\over P(B|D)} = {P(A)\over P(B)} \times {Z_A\over Z_B} $$

    interpretation of Z-ratio under flat priors:

    • prob. that this model is the right one, rather than the other.
    • A, B or equal