8.18.1. sklearn.mixture.GMM¶
- class sklearn.mixture.GMM(n_components=1, covariance_type='diag', random_state=None, thresh=0.01, min_covar=0.001)¶
Gaussian Mixture Model
Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.
Initializes parameters such that every mixture component has zero mean and identity covariance.
Parameters : n_components : int, optional
Number of mixture components. Defaults to 1.
covariance_type : string (read-only), optional
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.
rng : numpy.random object, optional
Must support the full numpy random number generator API.
min_covar : float, optional
Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3.
thresh : float, optional
Convergence threshold.
See also
Examples
>>> import numpy as np >>> from sklearn import mixture >>> np.random.seed(1) >>> g = mixture.GMM(n_components=2) >>> # Generate random observations with two modes centered on 0 >>> # and 10 to use for training. >>> obs = np.concatenate((np.random.randn(100, 1), ... 10 + np.random.randn(300, 1))) >>> g.fit(obs) GMM(covariance_type=None, min_covar=0.001, n_components=2, random_state=None, thresh=0.01) >>> np.round(g.weights_, 2) array([ 0.75, 0.25]) >>> np.round(g.means_, 2) array([[ 10.05], [ 0.06]]) >>> np.round(g.covars_, 2) array([[[ 1.02]], [[ 0.96]]]) >>> g.predict([[0], [2], [9], [10]]) array([1, 1, 0, 0]) >>> np.round(g.score([[0], [2], [9], [10]]), 2) array([-2.19, -4.58, -1.75, -1.21]) >>> # Refit the model on new data (initial parameters remain the >>> # same), this time with an even split between the two modes. >>> g.fit(20 * [[0]] + 20 * [[10]]) GMM(covariance_type=None, min_covar=0.001, n_components=2, random_state=None, thresh=0.01) >>> np.round(g.weights_, 2) array([ 0.5, 0.5])
Attributes
covariance_type string String describing the type of covariance parameters used by the GMM. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. weights_ array, shape (n_components,) Mixing weights for each mixture component. means_ array, shape (n_components, n_features) Mean parameters for each mixture component. covars_ array Covariance parameters for each mixture component. The shape depends on covariance_type:
(n_components,) if 'spherical', (n_features, n_features) if 'tied', (n_components, n_features) if 'diag', (n_components, n_features, n_features) if 'full'
converged_ bool True when convergence was reached in fit(), False otherwise. Methods
aic(X) Akaike information criterion for the current model fit bic(X) Bayesian information criterion for the current model fit decode(*args, **kwargs) DEPRECATED: will be removed in v0.12; eval(X) Evaluate the model on data fit(X[, n_iter, n_init, thresh, params, ...]) Estimate model parameters with the expectation-maximization algorithm. get_params([deep]) Get parameters for the estimator predict(X) Predict label for data. predict_proba(X) Predict posterior probability of data under each Gaussian rvs(*args, **kwargs) DEPRECATED: will be removed in v0.12; sample([n_samples, random_state]) Generate random samples from the model. score(X) Compute the log probability under the model. set_params(**params) Set the parameters of the estimator. - __init__(n_components=1, covariance_type='diag', random_state=None, thresh=0.01, min_covar=0.001)¶
- aic(X)¶
Akaike information criterion for the current model fit and the proposed data
Parameters : X : array of shape(n_samples, n_dimensions) Returns : aic: float (the lower the better) :
- bic(X)¶
Bayesian information criterion for the current model fit and the proposed data
Parameters : X : array of shape(n_samples, n_dimensions) Returns : bic: float (the lower the better) :
- decode(*args, **kwargs)¶
DEPRECATED: will be removed in v0.12; use the score or predict method instead, depending on the question
Find most likely mixture components for each point in X.
DEPRECATED IN VERSION 0.10; WILL BE REMOVED IN VERSION 0.12 use the score or predict method instead, depending on the question.Parameters : X : array_like, shape (n, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns : logprobs : array_like, shape (n_samples,)
Log probability of each point in obs under the model.
- components : array_like, shape (n_samples,)
Index of the most likelihod mixture components for each observation
- eval(X)¶
Evaluate the model on data
Compute the log probability of X under the model and return the posterior distribution (responsibilities) of each mixture component for each element of X.
Parameters : X: array_like, shape (n_samples, n_features) :
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns : logprob: array_like, shape (n_samples,) :
Log probabilities of each data point in X
responsibilities: array_like, shape (n_samples, n_components) :
Posterior probabilities of each mixture component for each observation
- fit(X, n_iter=100, n_init=1, thresh=0.01, params='wmc', init_params='wmc')¶
Estimate model parameters with the expectation-maximization algorithm.
A initialization step is performed before entering the em algorithm. If you want to avoid this step, set the keyword argument init_params to the empty string ‘’. Likewise, if you would like just to do an initialization, call this method with n_iter=0.
Parameters : X : array_like, shape (n, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
n_iter : int, optional
Number of EM iterations to perform.
n_init : int, optional
number of initializations to perform. the best results is kept
params : string, optional
Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
init_params : string, optional
Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
- get_params(deep=True)¶
Get parameters for the estimator
Parameters : deep: boolean, optional :
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- predict(X)¶
Predict label for data.
Parameters : X : array-like, shape = [n_samples, n_features] Returns : C : array, shape = (n_samples,)
- predict_proba(X)¶
Predict posterior probability of data under each Gaussian in the model.
Parameters : X : array-like, shape = [n_samples, n_features]
Returns : responsibilities : array-like, shape = (n_samples, n_components)
Returns the probability of the sample for each Gaussian (state) in the model.
- rvs(*args, **kwargs)¶
DEPRECATED: will be removed in v0.12; use the score or predict method instead, depending on the question
Generate random samples from the model.
DEPRECATED IN VERSION 0.11; WILL BE REMOVED IN VERSION 0.12 use sample instead
- sample(n_samples=1, random_state=None)¶
Generate random samples from the model.
Parameters : n_samples : int, optional
Number of samples to generate. Defaults to 1.
Returns : X : array_like, shape (n_samples, n_features)
List of samples
- score(X)¶
Compute the log probability under the model.
Parameters : X : array_like, shape (n_samples, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns : logprob : array_like, shape (n_samples,)
Log probabilities of each data point in X
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns : self :