This documentation is for scikit-learn version 0.10Other versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.3.6. sklearn.cross_validation.LeavePLabelOut

class sklearn.cross_validation.LeavePLabelOut(labels, p, indices=True)

Leave-P-Label_Out cross-validation iterator

Provides train/test indices to split data according to a third-party provided label. This label information can be used to encode arbitrary domain specific stratifications of the samples as integers.

For instance the labels could be the year of collection of the samples and thus allow for cross-validation against time-based splits.

The difference between LeavePLabelOut and LeaveOneLabelOut is that the former builds the test sets with all the samples assigned to p different values of the labels while the latter uses samples all assigned the same labels.

Parameters :

labels : array-like of int with shape (n_samples,)

Arbitrary domain-specific stratification of the data to be used to draw the splits.

p : int

Number of samples to leave out in the test split.

indices: boolean, optional (default True) :

Return train/test split as arrays of indices, rather than a boolean mask array. Integer indices are required when dealing with sparse matrices, since those cannot be indexed by boolean masks.

Examples

>>> from sklearn import cross_validation
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> y = np.array([1, 2, 1])
>>> labels = np.array([1, 2, 3])
>>> lpl = cross_validation.LeavePLabelOut(labels, p=2)
>>> len(lpl)
3
>>> print lpl
sklearn.cross_validation.LeavePLabelOut(labels=[1 2 3], p=2)
>>> for train_index, test_index in lpl:
...    print "TRAIN:", train_index, "TEST:", test_index
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
...    print X_train, X_test, y_train, y_test
TRAIN: [2] TEST: [0 1]
[[5 6]] [[1 2]
 [3 4]] [1] [1 2]
TRAIN: [1] TEST: [0 2]
[[3 4]] [[1 2]
 [5 6]] [2] [1 1]
TRAIN: [0] TEST: [1 2]
[[1 2]] [[3 4]
 [5 6]] [1] [2 1]
__init__(labels, p, indices=True)