This page

Citing

Please consider citing the scikit-learn.

9.7.3. sklearn.cluster.MeanShift

class sklearn.cluster.MeanShift(bandwidth=None, seeds=None, bin_seeding=False, cluster_all=True)

MeanShift clustering

Parameters :

bandwidth: float, optional :

Bandwith used in the RBF kernel If not set, the bandwidth is estimated. See clustering.estimate_bandwidth

seeds: array [n_samples, n_features], optional :

Seeds used to initialize kernels. If not set, the seeds are calculated by clustering.get_bin_seeds with bandwidth as the grid size and default values for other parameters.

cluster_all: boolean, default True :

If true, then all points are clustered, even those orphans that are not within any kernel. Orphans are assigned to the nearest kernel. If false, then orphans are given cluster label -1.

Notes

Reference:

Dorin Comaniciu and Peter Meer, “Mean Shift: A robust approach toward feature space analysis”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002. pp. 603-619.

Scalability:

Because this implementation uses a flat kernel and a Ball Tree to look up members of each kernel, the complexity will is to O(T*n*log(n)) in lower dimensions, with n the number of samples and T the number of points. In higher dimensions the complexity will tend towards O(T*n^2).

Scalability can be boosted by using fewer seeds, for examply by using a higher value of min_bin_freq in the get_bin_seeds function.

Note that the estimate_bandwidth function is much less scalable than the mean shift algorithm and will be the bottleneck if it is used.

Attributes

cluster_centers_: array, [n_clusters, n_features] Coordinates of cluster centers
labels_: Labels of each point

Methods

fit(X): Compute MeanShift clustering
__init__(bandwidth=None, seeds=None, bin_seeding=False, cluster_all=True)
fit(X)

Compute MeanShift

Parameters :

X : array [n_samples, n_features]

Input points

set_params(**params)

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :self :