Contents

6.5.1. scikits.learn.neighbors.Neighbors

class scikits.learn.neighbors.Neighbors(n_neighbors=5, window_size=1)

Classifier implementing k-Nearest Neighbor Algorithm.

Parameters :

data : array-like, shape (n, k)

The data points to be indexed. This array is not copied, and so modifying this data will result in bogus results.

labels : array

An array representing labels for the data (only arrays of integers are supported).

n_neighbors : int

default number of neighbors.

window_size : int

Window size passed to BallTree

Notes

http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Examples

>>> samples = [[0.,0.,1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
>>> labels = [0,0,1,1]
>>> from scikits.learn.neighbors import Neighbors
>>> neigh = Neighbors(n_neighbors=3)
>>> neigh.fit(samples, labels)
Neighbors(n_neighbors=3, window_size=1)
>>> print neigh.predict([[0,0,0]])
[ 0.]

Methods

fit
kneighbors
predict
score
__init__(n_neighbors=5, window_size=1)

Internally uses the ball tree datastructure and algorithm for fast neighbors lookups on high dimensional datasets.

kneighbors(data, n_neighbors=None)

Finds the K-neighbors of a point.

Parameters :

point : array-like

The new point.

n_neighbors : int

Number of neighbors to get (default is the value passed to the constructor).

Returns :

dist : array

Array representing the lengths to point.

ind : array

Array representing the indices of the nearest points in the population matrix.

Examples

In the following example, we construnct a Neighbors class from an array representing our data set and ask who’s the closest point to [1,1,1]

>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
>>> labels = [0, 0, 1]
>>> from scikits.learn.neighbors import Neighbors
>>> neigh = Neighbors(n_neighbors=1)
>>> neigh.fit(samples, labels)
Neighbors(n_neighbors=1, window_size=1)
>>> print neigh.kneighbors([1., 1., 1.])
(array(0.5), array(2))

As you can see, it returns [0.5], and [2], which means that the element is at distance 0.5 and is the third element of samples (indexes start at 0). You can also query for multiple points:

>>> print neigh.kneighbors([[0., 1., 0.], [1., 0., 1.]])
(array([ 0.5       ,  1.11803399]), array([1, 2]))
predict(T, n_neighbors=None)

Predict the class labels for the provided data.

Parameters :

test: array :

A 2-D array representing the test point.

n_neighbors : int

Number of neighbors to get (default is the value passed to the constructor).

Returns :

labels: array :

List of class labels (one for each data sample).

Examples

>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
>>> labels = [0, 0, 1]
>>> from scikits.learn.neighbors import Neighbors
>>> neigh = Neighbors(n_neighbors=1)
>>> neigh.fit(samples, labels)
Neighbors(n_neighbors=1, window_size=1)
>>> print neigh.predict([.2, .1, .2])
0
>>> print neigh.predict([[0., -1., 0.], [3., 2., 0.]])
[0 1]
score(X, y)

Returns the mean error rate on the given test data and labels.

Parameters :

X : array-like, shape = [n_samples, n_features]

Training set.

y : array-like, shape = [n_samples]

Labels for X.

Returns :

z : float