This documentation is for scikit-learn version 0.10Other versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.16.4.7. sklearn.metrics.pairwise.pairwise_distances

sklearn.metrics.pairwise.pairwise_distances(X, Y=None, metric='euclidean', **kwds)

Compute the distance matrix from a vector array X and optional Y.

This method takes either a vector array or a distance matrix, and returns a distance matrix. If the input is a vector array, the distances are computed. If the input is a distances matrix, it is returned instead.

This method provides a safe way to take a distance matrix as input, while preserving compatability with many other algorithms that take a vector array.

If Y is given (default is None), then the returned matrix is the pairwise distance between the arrays from both X and Y.

Please note that support for sparse matrices is currently limited to those metrics listed in pairwise.pairwise_distance_functions.

Valid values for metric are:

  • from scikits.learn: [‘euclidean’, ‘l2’, ‘l1’, ‘manhattan’, ‘cityblock’]
  • from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘cosine’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeucludean’, ‘yule’] See the documentation for scipy.spatial.distance for details on these metrics.

Note in the case of ‘euclidean’ and ‘cityblock’ (which are valid scipy.spatial.distance metrics), the values will use the scikits.learn implementation, which is faster and has support for sparse matrices. For a verbose description of the metrics from scikits.learn, see the __doc__ of the sklearn.pairwise.distance_metrics function.

Parameters :

X : array [n_samples_a, n_samples_a] if metric == “precomputed”, or, [n_samples_a, n_features] otherwise

Array of pairwise distances between samples, or a feature array.

Y : array [n_samples_b, n_features]

A second feature array only if X has shape [n_samples_a, n_features].

metric : string, or callable

The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter, or a metric listed in pairwise.pairwise_distance_functions. If metric is “precomputed”, X is assumed to be a distance matrix and must be square. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays from X as input and return a value indicating the distance between them.

`**kwds` : optional keyword parameters

Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.

Returns :

D : array [n_samples_a, n_samples_a] or [n_samples_a, n_samples_b]

A distance matrix D such that D_{i, j} is the distance between the ith and jth vectors of the given matrix X, if Y is None. If Y is not None, then D_{i, j} is the distance between the ith array from X and the jth array from Y.