This documentation is for scikit-learn version 0.10Other versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.3.7. sklearn.cross_validation.Bootstrap

class sklearn.cross_validation.Bootstrap(n, n_bootstraps=3, n_train=0.5, n_test=None, random_state=None)

Random sampling with replacement cross-validation iterator

Provides train/test indices to split data in train test sets while resampling the input n_bootstraps times: each time a new random split of the data is performed and then samples are drawn (with replacement) on each side of the split to build the training and test sets.

Note: contrary to other cross-validation strategies, bootstrapping will allow some samples to occur several times in each splits. However a sample that occurs in the train split will never occur in the test split and vice-versa.

If you want each sample to occur at most once you should probably use ShuffleSplit cross validation instead.

Parameters :

n : int

Total number of elements in the dataset.

n_bootstraps : int (default is 3)

Number of bootstrapping iterations

n_train : int or float (default is 0.5)

If int, number of samples to include in the training split (should be smaller than the total number of samples passed in the dataset).

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split.

n_test : int or float or None (default is None)

If int, number of samples to include in the training set (should be smaller than the total number of samples passed in the dataset).

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split.

If None, n_test is set as the complement of n_train.

random_state : int or RandomState

Pseudo number generator state used for random sampling.

See also

ShuffleSplit
cross validation using random permutations.

Examples

>>> from sklearn import cross_validation
>>> bs = cross_validation.Bootstrap(9, random_state=0)
>>> len(bs)
3
>>> print bs
Bootstrap(9, n_bootstraps=3, n_train=5, n_test=4, random_state=0)
>>> for train_index, test_index in bs:
...    print "TRAIN:", train_index, "TEST:", test_index
...
TRAIN: [1 8 7 7 8] TEST: [0 3 0 5]
TRAIN: [5 4 2 4 2] TEST: [6 7 1 0]
TRAIN: [4 7 0 1 1] TEST: [5 3 6 5]
__init__(n, n_bootstraps=3, n_train=0.5, n_test=None, random_state=None)