9.5.1. sklearn.mixture.GMM¶
- class sklearn.mixture.GMM(n_components=1, cvtype='diag', random_state=None, thresh=0.01, min_covar=0.001)¶
Gaussian Mixture Model
Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.
Initializes parameters such that every mixture component has zero mean and identity covariance.
Parameters : n_components : int, optional
Number of mixture components. Defaults to 1.
cvtype : string (read-only), optional
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.
rng : numpy.random object, optional
Must support the full numpy random number generator API.
min_covar : float, optional
Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3.
thresh : float, optional
Convergence threshold.
See also
- DPGMM
- Ininite gaussian mixture model, using the dirichlet
process, fit
- VBGMM
- Finite gaussian mixture model fit with a variational
algorithm, better, data
Examples
>>> import numpy as np >>> from sklearn import mixture >>> np.random.seed(1) >>> g = mixture.GMM(n_components=2)
>>> # Generate random observations with two modes centered on 0 >>> # and 10 to use for training. >>> obs = np.concatenate((np.random.randn(100, 1), ... 10 + np.random.randn(300, 1))) >>> g.fit(obs) GMM(cvtype='diag', n_components=2) >>> np.round(g.weights, 2) array([ 0.75, 0.25]) >>> np.round(g.means, 2) array([[ 10.05], [ 0.06]]) >>> np.round(g.covars, 2) array([[[ 1.02]], [[ 0.96]]]) >>> g.predict([[0], [2], [9], [10]]) array([1, 1, 0, 0]) >>> np.round(g.score([[0], [2], [9], [10]]), 2) array([-2.19, -4.58, -1.75, -1.21])
>>> # Refit the model on new data (initial parameters remain the >>> # same), this time with an even split between the two modes. >>> g.fit(20 * [[0]] + 20 * [[10]]) GMM(cvtype='diag', n_components=2) >>> np.round(g.weights, 2) array([ 0.5, 0.5])
Attributes
cvtype Covariance type of the model. weights Mixing weights for each mixture component. means Mean parameters for each mixture component. covars Return covars as a full matrix. n_features int Dimensionality of the Gaussians. n_states int (read-only) Number of mixture components. converged_ bool True when convergence was reached in fit(), False otherwise. Methods
decode(X) Find most likely mixture components for each point in X. eval(X) Compute the log likelihood of X under the model and the posterior distribution over mixture components. fit(X) Estimate model parameters from X using the EM algorithm. predict(X) Like decode, find most likely mixtures components for each observation in X. rvs(n=1, random_state=None) Generate n samples from the model. score(X) Compute the log likelihood of X under the model. - __init__(n_components=1, cvtype='diag', random_state=None, thresh=0.01, min_covar=0.001)¶
- covars¶
Return covars as a full matrix.
- cvtype¶
Covariance type of the model.
Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
- decode(obs)¶
Find most likely mixture components for each point in obs.
Parameters : obs : array_like, shape (n, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns : logprobs : array_like, shape (n_samples,)
Log probability of each point in obs under the model.
components : array_like, shape (n_samples,)
Index of the most likelihod mixture components for each observation
- eval(obs, return_log=False)¶
Evaluate the model on data
Compute the log probability of obs under the model and return the posterior distribution (responsibilities) of each mixture component for each element of obs.
Parameters : obs: array_like, shape (n_samples, n_features) :
List of n_features-dimensional data points. Each row corresponds to a single data point.
return_log: boolean, optional :
If True, the posteriors returned are log-probabilities
Returns : logprob: array_like, shape (n_samples,) :
Log probabilities of each data point in obs
posteriors: array_like, shape (n_samples, n_components) :
Posterior probabilities of each mixture component for each observation
- fit(X, n_iter=10, thresh=0.01, params='wmc', init_params='wmc')¶
Estimate model parameters with the expectation-maximization algorithm.
A initialization step is performed before entering the em algorithm. If you want to avoid this step, set the keyword argument init_params to the empty string ‘’. Likewise, if you would like just to do an initialization, call this method with n_iter=0.
Parameters : X : array_like, shape (n, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
n_iter : int, optional
Number of EM iterations to perform.
params : string, optional
Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
init_params : string, optional
Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
- means¶
Mean parameters for each mixture component.
- predict(X)¶
Predict label for data.
Parameters : X : array-like, shape = [n_samples, n_features] Returns : C : array, shape = (n_samples,)
- predict_proba(X)¶
Predict posterior probability of data under each Gaussian in the model.
Parameters : X : array-like, shape = [n_samples, n_features]
Returns : T : array-like, shape = (n_samples, n_components)
Returns the probability of the sample for each Gaussian (state) in the model.
- rvs(n_samples=1, random_state=None)¶
Generate random samples from the model.
Parameters : n_samples : int, optional
Number of samples to generate. Defaults to 1.
Returns : obs : array_like, shape (n_samples, n_features)
List of samples
- score(obs)¶
Compute the log probability under the model.
Parameters : obs : array_like, shape (n_samples, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns : logprob : array_like, shape (n_samples,)
Log probabilities of each data point in obs
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns : self :
- weights¶
Mixing weights for each mixture component.