9.7.2. sklearn.cluster.MiniBatchKMeans¶
- class sklearn.cluster.MiniBatchKMeans(k=8, init='random', max_iter=100, chunk_size=1000, tol=0.0001, verbose=0, random_state=None)¶
Mini-Batch K-Means clustering
Parameters : k : int, optional, default: 8
The number of clusters to form as well as the number of centroids to generate.
max_iter : int
Maximum number of iterations of the k-means algorithm for a single run.
chunk_size: int, optional, default: 1000 :
Size of the mini batches
init : {‘k-means++’, ‘random’ or an ndarray}
Method for initialization, defaults to ‘random’:
‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. Only for dense X.
‘random’: choose k observations (rows) at random from data for the initial centroids.
if init is an 2d array, it is used as a seed for the centroids
tol: float, optional default: 1e-4 :
Relative tolerance w.r.t. inertia to declare convergence
References
http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf
Attributes
cluster_centers_: array, [n_clusters, n_features] Coordinates of cluster centers labels_: Labels of each point inertia_: float The value of the inertia criterion associated with the chosen partition. Methods
fit(X): Compute K-Means clustering partial_fit(X): Compute a partial K-Means clustering - __init__(k=8, init='random', max_iter=100, chunk_size=1000, tol=0.0001, verbose=0, random_state=None)¶
- fit(X, y=None)¶
Compute the centroids on X by chunking it into mini-batches.
Parameters : X: array-like, shape = [n_samples, n_features] :
Coordinates of the data points to cluster
- partial_fit(X, y=None)¶
Update k means estimate on a single mini-batch X.
Parameters : X: array-like, shape = [n_samples, n_features] :
Coordinates of the data points to cluster.
- predict(X)¶
Predict the closest cluster each sample in X belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
Parameters : X: {array-like, sparse matrix}, shape = [n_samples, n_features] :
New data to predict.
Returns : Y : array, shape [n_samples,]
Index of the closest center each sample belongs to.
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns : self :
- transform(X, y=None)¶
Transform the data to a cluster-distance space
In the new space, each dimension is the distance to the cluster centers. Note that even if X is sparse, the array returned by transform will typically be dense.
Parameters : X: {array-like, sparse matrix}, shape = [n_samples, n_features] :
New data to transform.
Returns : X_new : array, shape [n_samples, k]
X transformed in the new space.