askcarl package

Submodules

askcarl.gaussian module

Multivariate Gaussians with support for upper limits and missing data.

askcarl.gaussian.pdfcdf(x, mask, mean, cov)[source]

Compute the mixed PDF and CDF for a multivariate Gaussian distribution.

Parameters:
  • x (array) – The point (vector) at which to evaluate the probability.

  • mask (array) – A boolean mask of the same shape as x, indicating whether the entry is a value (True) or a upper bound (False).

  • mean (array) – mean vector of the multivariate normal distribution.

  • cov (array) – covariance matrix of the multivariate normal distribution.

Returns:

pdf – Probability density

Return type:

float

class askcarl.gaussian.Gaussian(mean, cov, precision_cholesky=None)[source]

Bases: object

Multivariate Gaussians with support for upper limits and missing data.

Parameters:
  • mean (array) – mean vector of the multivariate normal distribution.

  • cov (array) – covariance matrix of the multivariate normal distribution.

  • precision_cholesky (array) – Cholesky factors of the precision matrix.

get_conditional_rv(mask)[source]

Build conditional distribution.

Parameters:

mask (array) – A boolean mask, indicating whether the entry is a value (True) or a upper bound (False).

Returns:

  • cov_cross (array) – Covariance matrix part of upper bound and value dimensions.

  • cov_exact (array) – Covariance matrix part of value dimensions.

  • inv_cov_exact (array) – Inverse covariance matrix part of value dimensions.

  • rv (scipy.stats.multivariate_normal) – Multivariate Normal Distribution of the upper bound dimensions, conditioned with mask.

conditional_pdf(x, mask=Ellipsis)[source]

Compute conditional PDF.

Parameters:
  • x (array) – The points (vector) at which to evaluate the probability.

  • mask (array) – A boolean mask of the same shape as x.shape[1], indicating whether the entry is a value (True) or a upper bound (False).

Returns:

pdf – Probability density. One value for each x.

Return type:

array

conditional_logpdf(x, mask=Ellipsis)[source]

Compute conditional log-PDF.

Parameters:
  • x (array) – The points (vector) at which to evaluate the probability.

  • mask (array) – A boolean mask of the same shape as x.shape[1], indicating whether the entry is a value (True) or a upper bound (False).

Returns:

logpdf – logarithm of the probability density. One value for each x.

Return type:

array

pdf(x, mask)[source]

Compute conditional PDF.

Parameters:
  • x (array) – The points (vector) at which to evaluate the probability.

  • mask (array) – A boolean mask of the same shape as x, indicating whether the entry is a value (True) or a upper bound (False).

Returns:

pdf – probability density. One value for each x.

Return type:

array

logpdf(x, mask)[source]

Compute conditional log-PDF.

Parameters:
  • x (array) – The points (vector) at which to evaluate the probability.

  • mask (array) – A boolean mask of the same shape as x, indicating whether the entry is a value (True) or a upper bound (False).

Returns:

logpdf – logarithm of the probability density. One value for each x.

Return type:

array

askcarl.lightgmm module

A extremely fast-to-train GMM.

class askcarl.lightgmm.LightGMM(n_components, refine_weights=False, init_kwargs={'init': 'random', 'max_iter': 1, 'n_init': 1}, warm_start=False, covariance_type='full')[source]

Bases: object

Wrapper which transforms KMeans results into a GMM.

Initialise.

Parameters:
  • n_components (int) – number of Gaussian components.

  • refine_weights (bool) – whether to include a E step at the end.

  • init_kwargs (dict) – arguments passed to KMeans

  • warm_start (bool) – not supported, has to be False

  • covariance_type (str) – only “full” is supported

fit(X, sample_weight=None, rng=<module 'numpy.random' from '/home/user/.local/lib/python3.12/site-packages/numpy/random/__init__.py'>)[source]

Fit.

Parameters:
  • X (array) – data, of shape (N, D)

  • sample_weight (array) – weights of observations. shape (N,)

  • rng (object) – Random number generator

to_sklearn()[source]

Convert to a scikit-learn GaussianMixture object.

Returns:

gmm – scikit-learn GaussianMixture

Return type:

object

score_samples(X)[source]

Compute score of samples.

Parameters:

X (array) – data, of shape (N, D)

Returns:

logprob – log-probabilities, one entry for each entry in X, of shape (N)

Return type:

array

score(X, sample_weight=None)[source]

Compute score of samples.

Parameters:
  • X (array) – data, of shape (N, D)

  • sample_weight (array) – weights of observations. shape (N,)

Returns:

logprob – average log-probabilities, one entry for each entry in X, of shape (N)

Return type:

float

sample(N)[source]

Generate samples from model.

Parameters:

N (int) – number of samples

Returns:

X – data, of shape (N, D)

Return type:

array

class askcarl.lightgmm.LightGMM2(n_components, init_kwargs={'init': 'random', 'max_iter': 1, 'n_init': 1}, warm_start=False, covariance_type='full')[source]

Bases: object

Wrapper which fits K-folds two LightGMMs results.

The training data is split into two halfs, and a mixture is built from each half. Then, the weights of the mixture are optimized with the other half. This should avoid overfitting (compared to building a GMM and optimizing on the same data set).

Initialise.

Parameters:
  • n_components (int) – number of Gaussian components.

  • refine_weights (bool) – whether to include a E step at the end.

  • init_kwargs (dict) – arguments passed to KMeans

  • warm_start (bool) – not supported, has to be False

  • covariance_type (str) – only “full” is supported

fit(X, sample_weight=None, rng=<module 'numpy.random' from '/home/user/.local/lib/python3.12/site-packages/numpy/random/__init__.py'>)[source]

Fit.

Parameters:
  • X (array) – data, of shape (N, D)

  • sample_weight (array) – weights of observations. shape (N,)

  • rng (object) – Random number generator

to_sklearn()[source]

Convert to a scikit-learn GaussianMixture object.

Returns:

gmm – scikit-learn GaussianMixture

Return type:

object

score_samples(X)[source]

Compute score of samples.

Parameters:

X (array) – data, of shape (N, D)

Returns:

logprob – log-probabilities, one entry for each entry in X, of shape (N)

Return type:

array

score(X, sample_weight=None)[source]

Compute score of samples.

Parameters:
  • X (array) – data, of shape (N, D)

  • sample_weight (array) – weights of observations. shape (N,)

Returns:

logprob – average log-probabilities, one entry for each entry in X, of shape (N)

Return type:

float

sample(N)[source]

Generate samples from model.

Parameters:

N (int) – number of samples

Returns:

X – data, of shape (N, D)

Return type:

array

class askcarl.lightgmm.LightBaggingGMM(n_gmms, **kwargs)[source]

Bases: object

Wrapper which fits B LightGMMs, and averages likelihoods.

The training data is split into two halfs, and a mixture is built from each half. Then, the weights of the mixture are optimized with the other half. This should avoid overfitting (compared to building a GMM and optimizing on the same data set).

Initialise.

Parameters:
  • n_gmms (int) – number of GMMs.

  • kwargs (dict) – passed to LightGMM.

fit(X, sample_weight=None, rng=<module 'numpy.random' from '/home/user/.local/lib/python3.12/site-packages/numpy/random/__init__.py'>)[source]

Fit.

Parameters:
  • X (array) – data, of shape (N, D)

  • sample_weight (array) – weights of observations. shape (N,)

  • rng (object) – Random number generator

to_sklearn()[source]

Convert to a scikit-learn GaussianMixture object.

Returns:

gmm – scikit-learn GaussianMixture

Return type:

object

score_samples(X)[source]

Compute score of samples.

Parameters:

X (array) – data, of shape (N, D)

Returns:

logprob – log-probabilities, one entry for each entry in X, of shape (N)

Return type:

array

score(X, sample_weight=None)[source]

Compute score of samples.

Parameters:
  • X (array) – data, of shape (N, D)

  • sample_weight (array) – weights of observations. shape (N,)

Returns:

logprob – average log-probabilities, one entry for each entry in X, of shape (N)

Return type:

float

sample(N)[source]

Generate samples from model.

Parameters:

N (int) – number of samples

Returns:

X – data, of shape (N, D)

Return type:

array

askcarl.mixture module

Mixture of Gaussians.

class askcarl.mixture.GaussianMixture(weights, means, covs, precisions_cholesky=None)[source]

Bases: object

Mixture of Gaussians.

Parameters:
  • weights (list) – weight for each Gaussian component

  • means (list) – mean vector for each Gaussian component.

  • covs (list) – covariance matrix for each Gaussian component.

weights

weight for each Gaussian component

Type:

list

components

list of Gaussian components.

Type:

list

static from_pypmc(mix)[source]

Initialize from a pypmc Gaussian mixture model (GMM).

Parameters:

mix (pypmc.density.mixture.GaussianMixture) – Gaussian mixture.

Returns:

mix – Generalized Gaussian mixture.

Return type:

GaussianMixture

static from_sklearn(skgmm)[source]

Initialize from a scikit-learn Gaussian mixture model (GMM).

Parameters:

skgmm (sklearn.mixture.GaussianMixture) – Gaussian mixture.

Returns:

mix – Generalized Gaussian mixture.

Return type:

GaussianMixture

pdf(x, mask)[source]

Compute probability density at x.

Parameters:
  • x (array) – The points (vector) at which to evaluate the probability.

  • mask (array) – A boolean mask of the same shape as x, indicating whether the entry is a value (True) or a upper bound (False).

Returns:

pdf – probability density. One value for each x.

Return type:

array

logpdf(x, mask)[source]

Compute logarithm of probability density.

Parameters:
  • x (array) – The points (vector) at which to evaluate the probability.

  • mask (array) – A boolean mask of the same shape as x, indicating whether the entry is a value (True) or a upper bound (False).

Returns:

logpdf – logarithm of the probability density. One value for each x.

Return type:

array

askcarl.utils module

Utility functions for dealing with Gaussians and their covariances.

askcarl.utils.mvn_logpdf(X, mean, prec_chol)[source]

Compute log-prob of a Gaussian.

Parameters:
  • X (array) – data, of shape (N, D)

  • mean (array) – Mean of Gaussian, of shape (D)

  • prec_chol (array) – precision matrix, of shape (D, D)

Returns:

logprob – log-probability, one entry for each entry in X, of shape (N)

Return type:

array

askcarl.utils.mvn_pdf(X, mean, prec_chol)[source]

Compute log-prob of a Gaussian.

Parameters:
  • X (array) – data, of shape (N, D)

  • mean (array) – Mean of Gaussian, of shape (D)

  • prec_chol (array) – precision matrix, of shape (D, D)

Returns:

logprob – log-probability, one entry for each entry in X, of shape (N)

Return type:

array

askcarl.utils.is_positive_definite(cov, tol=1e-10, condthresh=1000000.0)[source]

Check that the covariance matrix is well behaved.

Parameters:
  • cov (array) – covariance matrix. shape (D, D)

  • tol (float) – smallest eigvalsh value allowed

  • condthresh (float) – minimum on matrix condition number

Returns:

True if the matrix is invertable and positive definite

Return type:

bool

askcarl.utils.cov_to_prec_cholesky(cov)[source]

Convert covariance matrix to Cholesky factors of the precision matrix.

Parameters:

cov (array) – covariance matrix. shape (D, D)

Returns:

prec_cholesky – Cholesky factors of the precision matrix. shape (D, D)

Return type:

array

Module contents

Multivariate Gaussians with support for upper limits and missing data.