This documentation is for scikit-learn version 0.11-gitOther versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.18.2. sklearn.mixture.DPGMM

class sklearn.mixture.DPGMM(n_components=1, covariance_type='diag', alpha=1.0, random_state=None, thresh=0.01, verbose=False, min_covar=None)

Variational Inference for the Infinite Gaussian Mixture Model.

DPGMM stands for Dirichlet Process Gaussian Mixture Model, and it is an infinite mixture model with the Dirichlet Process as a prior distribution on the number of clusters. In practice the approximate inference algorithm uses a truncated distribution with a fixed maximum number of components, but almost always the number of components actually used depends on the data.

Stick-breaking Representation of a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over the parameters of a Gaussian mixture model with a variable number of components (smaller than the truncation parameter n_components).

Initialization is with normally-distributed means and identity covariance, for proper convergence.

Parameters :

n_components: int, optional :

Number of mixture components. Defaults to 1.

covariance_type: string, optional :

String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.

alpha: float, optional :

Real number representing the concentration parameter of the dirichlet process. Intuitively, the Dirichlet Process is as likely to start a new cluster for a point as it is to add that point to a cluster with alpha elements. A higher alpha means more clusters, as the expected number of clusters is alpha*log(N). Defaults to 1.

thresh : float, optional

Convergence threshold.

See also

GMM
Finite Gaussian mixture model fit with EM
VBGMM
Finite Gaussian mixture model fit with a variational

algorithm, better, data

Attributes

covariance_type string String describing the type of covariance parameters used by the DP-GMM. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
n_components int Number of mixture components.
weights_ array, shape (n_components,) Mixing weights for each mixture component.
means_ array, shape (n_components, n_features) Mean parameters for each mixture component.
precisions_ array

Precision (inverse covariance) parameters for each mixture component. The shape depends on covariance_type:

(`n_components`, 'n_features')                if 'spherical',
(`n_features`, `n_features`)                  if 'tied',
(`n_components`, `n_features`)                if 'diag',
(`n_components`, `n_features`, `n_features`)  if 'full'
converged_ bool True when convergence was reached in fit(), False otherwise.

Methods

aic(X) Akaike information criterion for the current model fit
bic(X) Bayesian information criterion for the current model fit
decode(*args, **kwargs) DEPRECATED: will be removed in v0.12;
eval(X) Evaluate the model on data
fit(X[, n_iter, params, init_params]) Estimate model parameters with the variational algorithm.
get_params([deep]) Get parameters for the estimator
lower_bound(X, z) returns a lower bound on model evidence based on X and membership
predict(X) Predict label for data.
predict_proba(X) Predict posterior probability of data under each Gaussian
rvs(*args, **kwargs) DEPRECATED: will be removed in v0.12;
sample([n_samples, random_state]) Generate random samples from the model.
score(X) Compute the log probability under the model.
set_params(**params) Set the parameters of the estimator.
__init__(n_components=1, covariance_type='diag', alpha=1.0, random_state=None, thresh=0.01, verbose=False, min_covar=None)
aic(X)

Akaike information criterion for the current model fit and the proposed data

Parameters :X : array of shape(n_samples, n_dimensions)
Returns :aic: float (the lower the better) :
bic(X)

Bayesian information criterion for the current model fit and the proposed data

Parameters :X : array of shape(n_samples, n_dimensions)
Returns :bic: float (the lower the better) :
decode(*args, **kwargs)

DEPRECATED: will be removed in v0.12; use the score or predict method instead, depending on the question

Find most likely mixture components for each point in X.

DEPRECATED IN VERSION 0.10; WILL BE REMOVED IN VERSION 0.12 use the score or predict method instead, depending on the question.
Parameters :

X : array_like, shape (n, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns :

logprobs : array_like, shape (n_samples,)

Log probability of each point in obs under the model.

components : array_like, shape (n_samples,)

Index of the most likelihod mixture components for each observation

eval(X)

Evaluate the model on data

Compute the bound on log probability of X under the model and return the posterior distribution (responsibilities) of each mixture component for each element of X.

This is done by computing the parameters for the mean-field of z for each observation.

Parameters :

X : array_like, shape (n_samples, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns :

logprob : array_like, shape (n_samples,)

Log probabilities of each data point in X

responsibilities: array_like, shape (n_samples, n_components) :

Posterior probabilities of each mixture component for each observation

fit(X, n_iter=10, params='wmc', init_params='wmc')

Estimate model parameters with the variational algorithm.

For a full derivation and description of the algorithm see doc/dp-derivation/dp-derivation.tex

A initialization step is performed before entering the em algorithm. If you want to avoid this step, set the keyword argument init_params to the empty string ‘’. Likewise, if you would like just to do an initialization, call this method with n_iter=0.

Parameters :

X : array_like, shape (n, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

n_iter : int, optional

Maximum number of iterations to perform before convergence.

params : string, optional

Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.

init_params : string, optional

Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.

get_params(deep=True)

Get parameters for the estimator

Parameters :

deep: boolean, optional :

If True, will return the parameters for this estimator and contained subobjects that are estimators.

lower_bound(X, z)

returns a lower bound on model evidence based on X and membership

predict(X)

Predict label for data.

Parameters :X : array-like, shape = [n_samples, n_features]
Returns :C : array, shape = (n_samples,)
predict_proba(X)

Predict posterior probability of data under each Gaussian in the model.

Parameters :

X : array-like, shape = [n_samples, n_features]

Returns :

responsibilities : array-like, shape = (n_samples, n_components)

Returns the probability of the sample for each Gaussian (state) in the model.

rvs(*args, **kwargs)

DEPRECATED: will be removed in v0.12; use the score or predict method instead, depending on the question

Generate random samples from the model.

DEPRECATED IN VERSION 0.11; WILL BE REMOVED IN VERSION 0.12 use sample instead
sample(n_samples=1, random_state=None)

Generate random samples from the model.

Parameters :

n_samples : int, optional

Number of samples to generate. Defaults to 1.

Returns :

X : array_like, shape (n_samples, n_features)

List of samples

score(X)

Compute the log probability under the model.

Parameters :

X : array_like, shape (n_samples, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns :

logprob : array_like, shape (n_samples,)

Log probabilities of each data point in X

set_params(**params)

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :self :