This page

Citing

Please consider citing the scikit-learn.

9.7.2. sklearn.cluster.MiniBatchKMeans

class sklearn.cluster.MiniBatchKMeans(k=8, init='random', max_iter=100, chunk_size=1000, tol=0.0001, verbose=0, random_state=None)

Mini-Batch K-Means clustering

Parameters :

k : int, optional, default: 8

The number of clusters to form as well as the number of centroids to generate.

max_iter : int

Maximum number of iterations of the k-means algorithm for a single run.

chunk_size: int, optional, default: 1000 :

Size of the mini batches

init : {‘k-means++’, ‘random’ or an ndarray}

Method for initialization, defaults to ‘random’:

‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. Only for dense X.

‘random’: choose k observations (rows) at random from data for the initial centroids.

if init is an 2d array, it is used as a seed for the centroids

tol: float, optional default: 1e-4 :

Relative tolerance w.r.t. inertia to declare convergence

References

http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

Attributes

cluster_centers_: array, [n_clusters, n_features] Coordinates of cluster centers
labels_: Labels of each point
inertia_: float The value of the inertia criterion associated with the chosen partition.

Methods

fit(X): Compute K-Means clustering
partial_fit(X): Compute a partial K-Means clustering
__init__(k=8, init='random', max_iter=100, chunk_size=1000, tol=0.0001, verbose=0, random_state=None)
fit(X, y=None)

Compute the centroids on X by chunking it into mini-batches.

Parameters :

X: array-like, shape = [n_samples, n_features] :

Coordinates of the data points to cluster

partial_fit(X, y=None)

Update k means estimate on a single mini-batch X.

Parameters :

X: array-like, shape = [n_samples, n_features] :

Coordinates of the data points to cluster.

predict(X)

Predict the closest cluster each sample in X belongs to.

In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters :

X: {array-like, sparse matrix}, shape = [n_samples, n_features] :

New data to predict.

Returns :

Y : array, shape [n_samples,]

Index of the closest center each sample belongs to.

set_params(**params)

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :self :
transform(X, y=None)

Transform the data to a cluster-distance space

In the new space, each dimension is the distance to the cluster centers. Note that even if X is sparse, the array returned by transform will typically be dense.

Parameters :

X: {array-like, sparse matrix}, shape = [n_samples, n_features] :

New data to transform.

Returns :

X_new : array, shape [n_samples, k]

X transformed in the new space.