8.17.4.7. sklearn.metrics.pairwise.pairwise_distances¶
- sklearn.metrics.pairwise.pairwise_distances(X, Y=None, metric='euclidean', n_jobs=1, **kwds)¶
Compute the distance matrix from a vector array X and optional Y.
This method takes either a vector array or a distance matrix, and returns a distance matrix. If the input is a vector array, the distances are computed. If the input is a distances matrix, it is returned instead.
This method provides a safe way to take a distance matrix as input, while preserving compatability with many other algorithms that take a vector array.
If Y is given (default is None), then the returned matrix is the pairwise distance between the arrays from both X and Y.
Please note that support for sparse matrices is currently limited to those metrics listed in pairwise.pairwise_distance_functions.
Valid values for metric are:
- from scikit-learn: [‘euclidean’, ‘l2’, ‘l1’, ‘manhattan’, ‘cityblock’]
- from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘cosine’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeucludean’, ‘yule’] See the documentation for scipy.spatial.distance for details on these metrics.
Note in the case of ‘euclidean’ and ‘cityblock’ (which are valid scipy.spatial.distance metrics), the values will use the scikit-learn implementation, which is faster and has support for sparse matrices. For a verbose description of the metrics from scikit-learn, see the __doc__ of the sklearn.pairwise.distance_metrics function.
Parameters : X : array [n_samples_a, n_samples_a] if metric == “precomputed”, or, [n_samples_a, n_features] otherwise
Array of pairwise distances between samples, or a feature array.
Y : array [n_samples_b, n_features]
A second feature array only if X has shape [n_samples_a, n_features].
metric : string, or callable
The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter, or a metric listed in pairwise.pairwise_distance_functions. If metric is “precomputed”, X is assumed to be a distance matrix and must be square. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays from X as input and return a value indicating the distance between them.
n_jobs : int
The number of jobs to use for the computation. This works by breaking down the pairwise matrix into n_jobs even slices and computing them in parallel.
If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debuging. For n_jobs below -1, (n_cpus + 1 - n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
`**kwds` : optional keyword parameters
Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.
Returns : D : array [n_samples_a, n_samples_a] or [n_samples_a, n_samples_b]
A distance matrix D such that D_{i, j} is the distance between the ith and jth vectors of the given matrix X, if Y is None. If Y is not None, then D_{i, j} is the distance between the ith array from X and the jth array from Y.