9.5.1. sklearn.mixture.GMM¶

class sklearn.mixture.GMM(n_components=1, cvtype='diag', random_state=None, thresh=0.01, min_covar=0.001)¶

Gaussian Mixture Model

Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.

Initializes parameters such that every mixture component has zero mean and identity covariance.

Parameters :

n_components : int, optional

Number of mixture components. Defaults to 1.

cvtype : string (read-only), optional

String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.

rng : numpy.random object, optional

Must support the full numpy random number generator API.

min_covar : float, optional

Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3.

thresh : float, optional

Convergence threshold.

See also

DPGMM: Ininite gaussian mixture model, using the dirichlet

process, fit

VBGMM: Finite gaussian mixture model fit with a variational

algorithm, better, data

Examples

>>> import numpy as np
>>> from sklearn import mixture
>>> np.random.seed(1)
>>> g = mixture.GMM(n_components=2)

>>> # Generate random observations with two modes centered on 0
>>> # and 10 to use for training.
>>> obs = np.concatenate((np.random.randn(100, 1),
...                       10 + np.random.randn(300, 1)))
>>> g.fit(obs)
GMM(cvtype='diag', n_components=2)
>>> np.round(g.weights, 2)
array([ 0.75,  0.25])
>>> np.round(g.means, 2)
array([[ 10.05],
       [  0.06]])
>>> np.round(g.covars, 2) 
array([[[ 1.02]],
       [[ 0.96]]])
>>> g.predict([[0], [2], [9], [10]])
array([1, 1, 0, 0])
>>> np.round(g.score([[0], [2], [9], [10]]), 2)
array([-2.19, -4.58, -1.75, -1.21])

>>> # Refit the model on new data (initial parameters remain the
>>> # same), this time with an even split between the two modes.
>>> g.fit(20 * [[0]] +  20 * [[10]])
GMM(cvtype='diag', n_components=2)
>>> np.round(g.weights, 2)
array([ 0.5,  0.5])

Attributes

`cvtype`	Covariance type of the model.
`weights`	Mixing weights for each mixture component.
`means`	Mean parameters for each mixture component.
`covars`	Return covars as a full matrix.

n_features	int	Dimensionality of the Gaussians.
n_states	int (read-only)	Number of mixture components.
converged_	bool	True when convergence was reached in fit(), False otherwise.

Methods

decode(X)	Find most likely mixture components for each point in X.
eval(X)	Compute the log likelihood of X under the model and the posterior distribution over mixture components.
fit(X)	Estimate model parameters from X using the EM algorithm.
predict(X)	Like decode, find most likely mixtures components for each observation in X.
rvs(n=1, random_state=None)	Generate n samples from the model.
score(X)	Compute the log likelihood of X under the model.

__init__(n_components=1, cvtype='diag', random_state=None, thresh=0.01, min_covar=0.001)¶

covars¶: Return covars as a full matrix.

cvtype¶

Covariance type of the model.

Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.

decode(obs)¶

Find most likely mixture components for each point in obs.

Parameters :

obs : array_like, shape (n, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns :

logprobs : array_like, shape (n_samples,)

Log probability of each point in obs under the model.

components : array_like, shape (n_samples,)

Index of the most likelihod mixture components for each observation

eval(obs, return_log=False)¶

Evaluate the model on data

Compute the log probability of obs under the model and return the posterior distribution (responsibilities) of each mixture component for each element of obs.

Parameters :

obs: array_like, shape (n_samples, n_features) :

List of n_features-dimensional data points. Each row corresponds to a single data point.

return_log: boolean, optional :

If True, the posteriors returned are log-probabilities

Returns :

logprob: array_like, shape (n_samples,) :

Log probabilities of each data point in obs

posteriors: array_like, shape (n_samples, n_components) :

Posterior probabilities of each mixture component for each observation

fit(X, n_iter=10, thresh=0.01, params='wmc', init_params='wmc')¶

Estimate model parameters with the expectation-maximization algorithm.

A initialization step is performed before entering the em algorithm. If you want to avoid this step, set the keyword argument init_params to the empty string ‘’. Likewise, if you would like just to do an initialization, call this method with n_iter=0.

Parameters :

X : array_like, shape (n, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

n_iter : int, optional

Number of EM iterations to perform.

params : string, optional

Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.

init_params : string, optional

Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.

means¶: Mean parameters for each mixture component.

predict(X)¶

Predict label for data.

Parameters :	X : array-like, shape = [n_samples, n_features]
Returns :	C : array, shape = (n_samples,)

predict_proba(X)¶

Predict posterior probability of data under each Gaussian in the model.

Parameters :

X : array-like, shape = [n_samples, n_features]

Returns :

T : array-like, shape = (n_samples, n_components)

Returns the probability of the sample for each Gaussian (state) in the model.

rvs(n_samples=1, random_state=None)¶

Generate random samples from the model.

Parameters :

n_samples : int, optional

Number of samples to generate. Defaults to 1.

Returns :

obs : array_like, shape (n_samples, n_features)

List of samples

score(obs)¶

Compute the log probability under the model.

Parameters :

obs : array_like, shape (n_samples, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns :

logprob : array_like, shape (n_samples,)

Log probabilities of each data point in obs

set_params(**params)¶

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :	self :

weights¶: Mixing weights for each mixture component.

This page

Citing

9.5.1. sklearn.mixture.GMM¶