8.18.2. sklearn.mixture.DPGMM¶
- class sklearn.mixture.DPGMM(n_components=1, covariance_type='diag', alpha=1.0, random_state=None, thresh=0.01, verbose=False, min_covar=None)¶
- Variational Inference for the Infinite Gaussian Mixture Model. - DPGMM stands for Dirichlet Process Gaussian Mixture Model, and it is an infinite mixture model with the Dirichlet Process as a prior distribution on the number of clusters. In practice the approximate inference algorithm uses a truncated distribution with a fixed maximum number of components, but almost always the number of components actually used depends on the data. - Stick-breaking Representation of a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over the parameters of a Gaussian mixture model with a variable number of components (smaller than the truncation parameter n_components). - Initialization is with normally-distributed means and identity covariance, for proper convergence. - Parameters : - n_components: int, optional : - Number of mixture components. Defaults to 1. - covariance_type: string, optional : - String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’. - alpha: float, optional : - Real number representing the concentration parameter of the dirichlet process. Intuitively, the Dirichlet Process is as likely to start a new cluster for a point as it is to add that point to a cluster with alpha elements. A higher alpha means more clusters, as the expected number of clusters is alpha*log(N). Defaults to 1. - thresh : float, optional - Convergence threshold. - See also - GMM
- Finite Gaussian mixture model fit with EM
- VBGMM
- Finite Gaussian mixture model fit with a variational
 - algorithm, better, data - Attributes - covariance_type - string - String describing the type of covariance parameters used by the DP-GMM. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. - n_components - int - Number of mixture components. - weights_ - array, shape (n_components,) - Mixing weights for each mixture component. - means_ - array, shape (n_components, n_features) - Mean parameters for each mixture component. - precisions_ - array - Precision (inverse covariance) parameters for each mixture component. The shape depends on covariance_type: - (`n_components`, 'n_features') if 'spherical', (`n_features`, `n_features`) if 'tied', (`n_components`, `n_features`) if 'diag', (`n_components`, `n_features`, `n_features`) if 'full' - converged_ - bool - True when convergence was reached in fit(), False otherwise. - Methods - aic(X) - Akaike information criterion for the current model fit - bic(X) - Bayesian information criterion for the current model fit - decode(*args, **kwargs) - DEPRECATED: will be removed in v0.12; - eval(X) - Evaluate the model on data - fit(X[, n_iter, params, init_params]) - Estimate model parameters with the variational algorithm. - get_params([deep]) - Get parameters for the estimator - lower_bound(X, z) - returns a lower bound on model evidence based on X and membership - predict(X) - Predict label for data. - predict_proba(X) - Predict posterior probability of data under each Gaussian - rvs(*args, **kwargs) - DEPRECATED: will be removed in v0.12; - sample([n_samples, random_state]) - Generate random samples from the model. - score(X) - Compute the log probability under the model. - set_params(**params) - Set the parameters of the estimator. - __init__(n_components=1, covariance_type='diag', alpha=1.0, random_state=None, thresh=0.01, verbose=False, min_covar=None)¶
 - aic(X)¶
- Akaike information criterion for the current model fit and the proposed data - Parameters : - X : array of shape(n_samples, n_dimensions) - Returns : - aic: float (the lower the better) : 
 - bic(X)¶
- Bayesian information criterion for the current model fit and the proposed data - Parameters : - X : array of shape(n_samples, n_dimensions) - Returns : - bic: float (the lower the better) : 
 - decode(*args, **kwargs)¶
- DEPRECATED: will be removed in v0.12; use the score or predict method instead, depending on the question - Find most likely mixture components for each point in X. DEPRECATED IN VERSION 0.10; WILL BE REMOVED IN VERSION 0.12 use the score or predict method instead, depending on the question.- Parameters : - X : array_like, shape (n, n_features) - List of n_features-dimensional data points. Each row corresponds to a single data point. - Returns : - logprobs : array_like, shape (n_samples,) - Log probability of each point in obs under the model. - components : array_like, shape (n_samples,)
- Index of the most likelihod mixture components for each observation 
 
 - eval(X)¶
- Evaluate the model on data - Compute the bound on log probability of X under the model and return the posterior distribution (responsibilities) of each mixture component for each element of X. - This is done by computing the parameters for the mean-field of z for each observation. - Parameters : - X : array_like, shape (n_samples, n_features) - List of n_features-dimensional data points. Each row corresponds to a single data point. - Returns : - logprob : array_like, shape (n_samples,) - Log probabilities of each data point in X - responsibilities: array_like, shape (n_samples, n_components) : - Posterior probabilities of each mixture component for each observation 
 - fit(X, n_iter=10, params='wmc', init_params='wmc')¶
- Estimate model parameters with the variational algorithm. - For a full derivation and description of the algorithm see doc/dp-derivation/dp-derivation.tex - A initialization step is performed before entering the em algorithm. If you want to avoid this step, set the keyword argument init_params to the empty string ‘’. Likewise, if you would like just to do an initialization, call this method with n_iter=0. - Parameters : - X : array_like, shape (n, n_features) - List of n_features-dimensional data points. Each row corresponds to a single data point. - n_iter : int, optional - Maximum number of iterations to perform before convergence. - params : string, optional - Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’. - init_params : string, optional - Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’. 
 - get_params(deep=True)¶
- Get parameters for the estimator - Parameters : - deep: boolean, optional : - If True, will return the parameters for this estimator and contained subobjects that are estimators. 
 - lower_bound(X, z)¶
- returns a lower bound on model evidence based on X and membership 
 - predict(X)¶
- Predict label for data. - Parameters : - X : array-like, shape = [n_samples, n_features] - Returns : - C : array, shape = (n_samples,) 
 - predict_proba(X)¶
- Predict posterior probability of data under each Gaussian in the model. - Parameters : - X : array-like, shape = [n_samples, n_features] - Returns : - responsibilities : array-like, shape = (n_samples, n_components) - Returns the probability of the sample for each Gaussian (state) in the model. 
 - rvs(*args, **kwargs)¶
- DEPRECATED: will be removed in v0.12; use the score or predict method instead, depending on the question - Generate random samples from the model. DEPRECATED IN VERSION 0.11; WILL BE REMOVED IN VERSION 0.12 use sample instead
 - sample(n_samples=1, random_state=None)¶
- Generate random samples from the model. - Parameters : - n_samples : int, optional - Number of samples to generate. Defaults to 1. - Returns : - X : array_like, shape (n_samples, n_features) - List of samples 
 - score(X)¶
- Compute the log probability under the model. - Parameters : - X : array_like, shape (n_samples, n_features) - List of n_features-dimensional data points. Each row corresponds to a single data point. - Returns : - logprob : array_like, shape (n_samples,) - Log probabilities of each data point in X 
 - set_params(**params)¶
- Set the parameters of the estimator. - The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. - Returns : - self : 
 
