6.6.1. scikits.learn.mixture.GMM¶
- class scikits.learn.mixture.GMM(n_states=1, cvtype='diag')¶
- Gaussian Mixture Model - Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution. - Initializes parameters such that every mixture component has zero mean and identity covariance. - Parameters : - n_states : int - Number of mixture components. - cvtype : string (read-only) - String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’. - Examples - >>> import numpy as np >>> from scikits.learn import mixture >>> g = mixture.GMM(n_states=2) - >>> # Generate random observations with two modes centered on 0 >>> # and 10 to use for training. >>> np.random.seed(0) >>> obs = np.concatenate((np.random.randn(100, 1), ... 10 + np.random.randn(300, 1))) >>> g.fit(obs) GMM(cvtype='diag', n_states=2) >>> g.weights array([ 0.25, 0.75]) >>> g.means array([[ 0.05980802], [ 9.94199467]]) >>> g.covars [array([[ 1.01682662]]), array([[ 0.96080513]])] >>> np.round(g.weights, 2) array([ 0.25, 0.75]) >>> np.round(g.means, 2) array([[ 0.06], [ 9.94]]) >>> np.round(g.covars, 2) ... array([[[ 1.02]], [[ 0.96]]]) >>> g.predict([[0], [2], [9], [10]]) array([0, 0, 1, 1]) >>> np.round(g.score([[0], [2], [9], [10]]), 2) array([-2.32, -4.16, -1.65, -1.19]) - >>> # Refit the model on new data (initial parameters remain the >>> # same), this time with an even split between the two modes. >>> g.fit(20 * [[0]] + 20 * [[10]]) GMM(cvtype='diag', n_states=2) >>> np.round(g.weights, 2) array([ 0.5, 0.5]) - Attributes - cvtype - n_states - weights - means - covars - n_features - int - Dimensionality of the Gaussians. - Methods - decode(X) - Find most likely mixture components for each point in X. - eval(X) - Compute the log likelihood of X under the model and the posterior distribution over mixture components. - fit(X) - Estimate model parameters from X using the EM algorithm. - predict(X) - Like decode, find most likely mixtures components for each observation in X. - rvs(n=1) - Generate n samples from the model. - score(X) - Compute the log likelihood of X under the model. - __init__(n_states=1, cvtype='diag')¶
 - covars¶
- Return covars as a full matrix. 
 - cvtype¶
- Covariance type of the model. - Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. 
 - decode(obs)¶
- Find most likely mixture components for each point in obs. - Parameters : - obs : array_like, shape (n, n_features) - List of n_features-dimensional data points. Each row corresponds to a single data point. - Returns : - logprobs : array_like, shape (n,) - Log probability of each point in obs under the model. - components : array_like, shape (n,) - Index of the most likelihod mixture components for each observation 
 - eval(obs)¶
- Evaluate the model on data - Compute the log probability of obs under the model and return the posterior distribution (responsibilities) of each mixture component for each element of obs. - Parameters : - obs : array_like, shape (n, n_features) - List of n_features-dimensional data points. Each row corresponds to a single data point. - Returns : - logprob : array_like, shape (n,) - Log probabilities of each data point in obs - posteriors: array_like, shape (n, n_states) : - Posterior probabilities of each mixture component for each observation 
 - fit(X, n_iter=10, min_covar=0.001, thresh=0.01, params='wmc', init_params='wmc')¶
- Estimate model parameters with the expectation-maximization algorithm. - A initialization step is performed before entering the em algorithm. If you want to avoid this step, set the keyword argument init_params to the empty string ‘’. Likewise, if you would like just to do an initialization, call this method with n_iter=0. - Parameters : - X : array_like, shape (n, n_features) - List of n_features-dimensional data points. Each row corresponds to a single data point. - n_iter : int, optional - Number of EM iterations to perform. - min_covar : float, optional - Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3. - thresh : float, optional - Convergence threshold. - params : string, optional - Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’. - init_params : string, optional - Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’. 
 - means¶
- Mean parameters for each mixture component. 
 - n_states¶
- Number of mixture components in the model. 
 - predict(X)¶
- Predict label for data. - Parameters : - X : array-like, shape = [n_samples, n_features] - Returns : - C : array, shape = [n_samples] 
 - predict_proba(X)¶
- Predict posterior probability of data under each Gaussian in the model. - Parameters : - X : array-like, shape = [n_samples, n_features] - Returns : - T : array-like, shape = [n_samples, n_states] - Returns the probability of the sample for each Gaussian (state) in the model. 
 - rvs(n=1)¶
- Generate random samples from the model. - Parameters : - n : int - Number of samples to generate. - Returns : - obs : array_like, shape (n, n_features) - List of samples 
 - score(obs)¶
- Compute the log probability under the model. - Parameters : - obs : array_like, shape (n, n_features) - List of n_features-dimensional data points. Each row corresponds to a single data point. - Returns : - logprob : array_like, shape (n,) - Log probabilities of each data point in obs 
 - weights¶
- Mixing weights for each mixture component. 
 
