8.1.2. sklearn.cluster.DBSCAN¶
- class sklearn.cluster.DBSCAN(eps=0.5, min_samples=5, metric='euclidean', verbose=False, random_state=None)¶
Perform DBSCAN clustering from vector array or distance matrix.
DBSCAN - Density-Based Spatial Clustering of Applications with Noise. Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density.
Parameters : eps : float, optional
The maximum distance between two samples for them to be considered as in the same neighborhood.
min_samples : int, optional
The number of samples in a neighborhood for a point to be considered as a core point.
metric : string, or callable
The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by metrics.pairwise.calculate_distance for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix and must be square.
random_state : numpy.RandomState, optional
The generator used to initialize the centers. Defaults to numpy.random.
verbose : boolean, optional
The verbosity level
Notes
See examples/plot_dbscan.py for an example.
References: Ester, M., H. P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226–231. 1996
Attributes
core_sample_indices_ array, shape = [n_core_samples] Indices of core samples. components_ array, shape = [n_core_samples, n_features] Copy of each core sample found by training. labels_ array, shape = [n_samples] Cluster labels for each point in the dataset given to fit(). Noisy samples are given the label -1. Methods
fit(X, **params) Perform DBSCAN clustering from vector array or distance matrix. set_params(**params) Set the parameters of the estimator. - __init__(eps=0.5, min_samples=5, metric='euclidean', verbose=False, random_state=None)¶
- fit(X, **params)¶
Perform DBSCAN clustering from vector array or distance matrix.
Parameters : X: array [n_samples, n_samples] or [n_samples, n_features] :
Array of distances between samples, or a feature array. The array is treated as a feature array unless the metric is given as ‘precomputed’.
params: dict :
Overwrite keywords from __init__.
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns : self :