8.17.1. sklearn.mixture.GMM¶
- class sklearn.mixture.GMM(n_components=1, cvtype='diag', random_state=None, thresh=0.01, min_covar=0.001)¶
Gaussian Mixture Model
Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.
Initializes parameters such that every mixture component has zero mean and identity covariance.
Parameters : n_components : int, optional
Number of mixture components. Defaults to 1.
cvtype : string (read-only), optional
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.
rng : numpy.random object, optional
Must support the full numpy random number generator API.
min_covar : float, optional
Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3.
thresh : float, optional
Convergence threshold.
See also
Examples
>>> import numpy as np >>> from sklearn import mixture >>> np.random.seed(1) >>> g = mixture.GMM(n_components=2) >>> # Generate random observations with two modes centered on 0 >>> # and 10 to use for training. >>> obs = np.concatenate((np.random.randn(100, 1), ... 10 + np.random.randn(300, 1))) >>> g.fit(obs) GMM(cvtype='diag', n_components=2) >>> np.round(g.weights, 2) array([ 0.75, 0.25]) >>> np.round(g.means, 2) array([[ 10.05], [ 0.06]]) >>> np.round(g.covars, 2) array([[[ 1.02]], [[ 0.96]]]) >>> g.predict([[0], [2], [9], [10]]) array([1, 1, 0, 0]) >>> np.round(g.score([[0], [2], [9], [10]]), 2) array([-2.19, -4.58, -1.75, -1.21]) >>> # Refit the model on new data (initial parameters remain the >>> # same), this time with an even split between the two modes. >>> g.fit(20 * [[0]] + 20 * [[10]]) GMM(cvtype='diag', n_components=2) >>> np.round(g.weights, 2) array([ 0.5, 0.5])
Attributes
weights Mixing weights for each mixture component. means Mean parameters for each mixture component. cvtype Covariance type of the model. covars Covariance parameters for each mixture component. n_features int Dimensionality of the Gaussians. n_states int (read-only) Number of mixture components. converged_ bool True when convergence was reached in fit(), False otherwise. Methods
decode(obs) Find most likely mixture components for each point in obs. eval(obs) Evaluate the model on data fit(X[, n_iter, thresh, params, init_params]) Estimate model parameters with the expectation-maximization algorithm. predict(X) Predict label for data. predict_proba(X) Predict posterior probability of data under each Gaussian rvs([n_samples, random_state]) Generate random samples from the model. score(obs) Compute the log probability under the model. set_params(**params) Set the parameters of the estimator. - __init__(n_components=1, cvtype='diag', random_state=None, thresh=0.01, min_covar=0.001)¶
- covars¶
Covariance parameters for each mixture component. The shape depends on cvtype:
(`n_states`,) if 'spherical', (`n_features`, `n_features`) if 'tied', (`n_states`, `n_features`) if 'diag', (`n_states`, `n_features`, `n_features`) if 'full'
- cvtype¶
Covariance type of the model. String describing the type of covariance parameters used by the GMM. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
- decode(obs)¶
Find most likely mixture components for each point in obs.
Parameters : obs : array_like, shape (n, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns : logprobs : array_like, shape (n_samples,)
Log probability of each point in obs under the model.
components : array_like, shape (n_samples,)
Index of the most likelihod mixture components for each observation
- eval(obs)¶
Evaluate the model on data
Compute the log probability of obs under the model and return the posterior distribution (responsibilities) of each mixture component for each element of obs.
Parameters : obs: array_like, shape (n_samples, n_features) :
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns : logprob: array_like, shape (n_samples,) :
Log probabilities of each data point in obs
posteriors: array_like, shape (n_samples, n_components) :
Posterior probabilities of each mixture component for each observation
- fit(X, n_iter=10, thresh=0.01, params='wmc', init_params='wmc')¶
Estimate model parameters with the expectation-maximization algorithm.
A initialization step is performed before entering the em algorithm. If you want to avoid this step, set the keyword argument init_params to the empty string ‘’. Likewise, if you would like just to do an initialization, call this method with n_iter=0.
Parameters : X : array_like, shape (n, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
n_iter : int, optional
Number of EM iterations to perform.
params : string, optional
Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
init_params : string, optional
Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
- means¶
Mean parameters for each mixture component. array, shape (n_states, n_features).
- predict(X)¶
Predict label for data.
Parameters : X : array-like, shape = [n_samples, n_features] Returns : C : array, shape = (n_samples,)
- predict_proba(X)¶
Predict posterior probability of data under each Gaussian in the model.
Parameters : X : array-like, shape = [n_samples, n_features]
Returns : T : array-like, shape = (n_samples, n_components)
Returns the probability of the sample for each Gaussian (state) in the model.
- rvs(n_samples=1, random_state=None)¶
Generate random samples from the model.
Parameters : n_samples : int, optional
Number of samples to generate. Defaults to 1.
Returns : obs : array_like, shape (n_samples, n_features)
List of samples
- score(obs)¶
Compute the log probability under the model.
Parameters : obs : array_like, shape (n_samples, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns : logprob : array_like, shape (n_samples,)
Log probabilities of each data point in obs
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns : self :
- weights¶
Mixing weights for each mixture component. array, shape (n_states,)