This page

Citing

Please consider citing the scikit-learn.

9.13.4. sklearn.cross_validation.StratifiedKFold

class sklearn.cross_validation.StratifiedKFold(y, k, indices=False)

Stratified K-Folds cross validation iterator

Provides train/test indices to split data in train test sets.

This cross-validation object is a variation of KFold, which returns stratified folds. The folds are made by preserving the percentage of samples for each class.

Parameters :

y: array, [n_samples] :

Samples to split in K folds

k: int :

Number of folds

indices: boolean, optional (default False) :

Return train/test split with integer indices or boolean mask. Integer indices are useful when dealing with sparse matrices that cannot be indexed by boolean masks.

Notes

All the folds have size trunc(n_samples / n_folds), the last one has the complementary.

Examples

>>> from sklearn import cross_validation
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 1, 1])
>>> skf = cross_validation.StratifiedKFold(y, k=2)
>>> len(skf)
2
>>> print skf
sklearn.cross_validation.StratifiedKFold(labels=[0 0 1 1], k=2)
>>> for train_index, test_index in skf:
...    print "TRAIN:", train_index, "TEST:", test_index
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [False  True False  True] TEST: [ True False  True False]
TRAIN: [ True False  True False] TEST: [False  True False  True]
__init__(y, k, indices=False)