This documentation is for scikit-learn version 0.11-gitOther versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.1.5. sklearn.cluster.MeanShift

class sklearn.cluster.MeanShift(bandwidth=None, seeds=None, bin_seeding=False, cluster_all=True)

MeanShift clustering

Parameters :

bandwidth: float, optional :

Bandwith used in the RBF kernel If not set, the bandwidth is estimated. See clustering.estimate_bandwidth

seeds: array [n_samples, n_features], optional :

Seeds used to initialize kernels. If not set, the seeds are calculated by clustering.get_bin_seeds with bandwidth as the grid size and default values for other parameters.

cluster_all: boolean, default True :

If true, then all points are clustered, even those orphans that are not within any kernel. Orphans are assigned to the nearest kernel. If false, then orphans are given cluster label -1.

Notes

Scalability:

Because this implementation uses a flat kernel and a Ball Tree to look up members of each kernel, the complexity will is to O(T*n*log(n)) in lower dimensions, with n the number of samples and T the number of points. In higher dimensions the complexity will tend towards O(T*n^2).

Scalability can be boosted by using fewer seeds, for examply by using a higher value of min_bin_freq in the get_bin_seeds function.

Note that the estimate_bandwidth function is much less scalable than the mean shift algorithm and will be the bottleneck if it is used.

References

Dorin Comaniciu and Peter Meer, “Mean Shift: A robust approach toward feature space analysis”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002. pp. 603-619.

Attributes

cluster_centers_ array, [n_clusters, n_features] Coordinates of cluster centers
labels_ :   Labels of each point

Methods

fit(X) Compute MeanShift
get_params([deep]) Get parameters for the estimator
set_params(**params) Set the parameters of the estimator.
__init__(bandwidth=None, seeds=None, bin_seeding=False, cluster_all=True)
fit(X)

Compute MeanShift

Parameters :

X : array [n_samples, n_features]

Input points

get_params(deep=True)

Get parameters for the estimator

Parameters :

deep: boolean, optional :

If True, will return the parameters for this estimator and contained subobjects that are estimators.

set_params(**params)

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :self :