8.1.2. sklearn.cluster.DBSCAN¶
- class sklearn.cluster.DBSCAN(eps=0.5, min_samples=5, metric='euclidean', verbose=False, random_state=None)¶
- Perform DBSCAN clustering from vector array or distance matrix. - DBSCAN - Density-Based Spatial Clustering of Applications with Noise. Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density. - Parameters : - eps : float, optional - The maximum distance between two samples for them to be considered as in the same neighborhood. - min_samples : int, optional - The number of samples in a neighborhood for a point to be considered as a core point. - metric : string, or callable - The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by metrics.pairwise.calculate_distance for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix and must be square. - random_state : numpy.RandomState, optional - The generator used to initialize the centers. Defaults to numpy.random. - verbose : boolean, optional - The verbosity level - Notes - See examples/plot_dbscan.py for an example. - References - Ester, M., H. P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226–231. 1996 - Attributes - core_sample_indices_ - array, shape = [n_core_samples] - Indices of core samples. - components_ - array, shape = [n_core_samples, n_features] - Copy of each core sample found by training. - labels_ - array, shape = [n_samples] - Cluster labels for each point in the dataset given to fit(). Noisy samples are given the label -1. - Methods - fit(X, **params) - Perform DBSCAN clustering from vector array or distance matrix. - get_params([deep]) - Get parameters for the estimator - set_params(**params) - Set the parameters of the estimator. - __init__(eps=0.5, min_samples=5, metric='euclidean', verbose=False, random_state=None)¶
 - fit(X, **params)¶
- Perform DBSCAN clustering from vector array or distance matrix. - Parameters : - X: array [n_samples, n_samples] or [n_samples, n_features] : - Array of distances between samples, or a feature array. The array is treated as a feature array unless the metric is given as ‘precomputed’. - params: dict : - Overwrite keywords from __init__. 
 - get_params(deep=True)¶
- Get parameters for the estimator - Parameters : - deep: boolean, optional : - If True, will return the parameters for this estimator and contained subobjects that are estimators. 
 - set_params(**params)¶
- Set the parameters of the estimator. - The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. - Returns : - self : 
 
