LightGMM
LightGMM is a extremely fast construction of a GMM for large data sets.
Features:
based on jax
supports weighted observations
can be converted into a scikit-learn GaussianMixture
Method
If refine_weights=False, LightGMM relies on scikit-learn’s KMeans for initialisation, using kmeans++ or random data samples (‘random’, very fast) for initally guessing the cluster.
The KMeans steps (n_iter and max_iter parameters) give the cluster centers. Regularized covariance matrices are then computed from cluster members. These set the shapes and locations of the Gaussian mixture model.
The GMM component weights are assigned proportional to number of cluster members.
Alternatively, if refine_weights=True, with a single E step is performed for maximizing the training set likelihood.
Requirements
You need scikit-learn and jax installed. These are not installed by default with askcarl.
Usage
The API is scikit-learn-like:
gmm = LightGMM(**kwargs)
gmm.fit(X_train)
logprob = gmm.score_samples(X_test)
score = gmm.score(X_test)
You can also convert to a scikit-learn GaussianMixture object:
gmmsk = gmm.to_sklearn()
Performance
LightGMM achieves sub-100ms constructions for data sets with 50000 samples and ~15 features.
See this benchmark: https://github.com/JohannesBuchner/GMM-benchmark