This documentation is for scikit-learn version 0.10Other versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.1.6. sklearn.cluster.SpectralClustering

class sklearn.cluster.SpectralClustering(k=8, mode=None, random_state=None, n_init=10)

Apply k-means to a projection to the normalized laplacian

In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance when clusters are nested circles on the 2D plan.

If affinity is the adjacency matrix of a graph, this method can be used to find normalized graph cuts.

Parameters :

k: integer, optional :

The dimension of the projection subspace.

mode: {None, ‘arpack’ or ‘amg’} :

The eigenvalue decomposition strategy to use. AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities

random_state: int seed, RandomState instance, or None (default) :

A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when mode == ‘amg’ and by the K-Means initialization.

n_init: int, optional, default: 10 :

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

Notes

References:

Attributes

labels_ : Labels of each point

Methods

fit(X) Compute the spectral clustering from the affinity matrix
set_params(**params) Set the parameters of the estimator.
__init__(k=8, mode=None, random_state=None, n_init=10)
fit(X)

Compute the spectral clustering from the affinity matrix

Parameters :

X: array-like or sparse matrix, shape: (n_samples, n_samples) :

An affinity matrix describing the pairwise similarity of the data. If can also be an adjacency matrix of the graph to embed. X must be symmetric and its entries must be positive or zero. Zero means that elements have nothing in common, whereas high values mean that elements are strongly similar.

Notes

If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the gaussian (heat) kernel:

np.exp(- X ** 2 / (2. * delta ** 2))

Another alternative is to take a symmetric version of the k nearest neighbors connectivity matrix of the points.

If the pyamg package is installed, it is used: this greatly speeds up computation.

set_params(**params)

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :self :