Contents

6.13.2. scikits.learn.feature_selection.rfe.RFECV

class scikits.learn.feature_selection.rfe.RFECV(estimator=None, n_features=None, percentage=0.10000000000000001, loss_func=None)

Feature ranking with Recursive feature elimination and cross validation

Parameters :

estimator : object

A supervised learning estimator with a fit method that updates a coef_ attributes that holds the fitted parameters. The first dimension of the coef_ array must be equal n_features an important features must yield high absolute values in the coef_ array.

For instance this is the case for most supervised learning algorithms such as Support Vector Classifiers and Generalized Linear Models from the svm and linear_model package.

n_features : int

Number of features to select

percentage : float

The percentage of features to remove at each iteration Should be between (0, 1]. By default 0.1 will be taken.

References

Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Mach. Learn., 46(1-3), 389–422.

Examples

>>> # TODO!

Attributes

support_ array-like, shape = [n_features] Mask of estimated support
ranking_ array-like, shape = [n_features] Mask of the ranking of features

Methods

fit(X, y) self Fit the model
transform(X) array Reduce X to support
__init__(estimator=None, n_features=None, percentage=0.10000000000000001, loss_func=None)
fit(X, y, cv=None)

Fit the RFE model according to the given training data and parameters.

The final size of the support is tuned by cross validation.

Parameters :

X : array-like, shape = [n_samples, n_features]

Training vector, where n_samples in the number of samples and n_features is the number of features.

y : array, shape = [n_samples]

Target values (integers in classification, real numbers in regression)

cv : cross-validation instance

transform(X, copy=True)

Reduce X to the features selected during the fit

Parameters :

X : array-like, shape = [n_samples, n_features]

Vector, where n_samples in the number of samples and n_features is the number of features.