scikits.learn.cluster.SpectralClustering¶
- class scikits.learn.cluster.SpectralClustering(k=8, mode=None, random_state=None)¶
Apply k-means to a projection to the normalized laplacian
In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance when clusters are nested circles on the 2D plan.
If affinity is the adjacency matrix of a graph, this method can be used to find normalized graph cuts.
Parameters : k: integer, optional :
The dimension of the projection subspace.
mode: {None, ‘arpack’ or ‘amg’} :
The eigenvalue decomposition strategy to use. AMG (Algebraic MultiGrid) is much faster, but requires pyamg to be installed.
random_state: int seed, RandomState instance, or None (default) :
A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when mode == ‘amg’ and by the K-Means initialization.
References
- Normalized cuts and image segmentation, 2000 Jianbo Shi, Jitendra Malik http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.2324
- A Tutorial on Spectral Clustering, 2007 Ulrike von Luxburg http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.9323
Attributes
labels_: Labels of each point Methods
fit(X): Compute spectral clustering - __init__(k=8, mode=None, random_state=None)¶
- fit(X, **params)¶
Compute the spectral clustering from the affinity matrix
Parameters : X: array-like or sparse matrix, shape: (n_samples, n_samples) :
An affinity matrix describing the pairwise similarity of the data. If can also be an adjacency matrix of the graph to embed. X must be symmetric and its entries must be positive or zero. Zero means that elements have nothing in common, whereas high values mean that elements are strongly similar.
Notes
If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the gaussian (heat) kernel:
np.exp(- X ** 2 / (2. * delta ** 2))
Another alternative is to take a symmetric version of the k nearest neighbors connectivity matrix of the points.
If the pyamg package is installed, it is used: this greatly speeds up computation.