This documentation is for scikit-learn version 0.10Other versions

Citing

If you use the software, please consider citing scikit-learn.

This page

3.11. Feature selection

The classes in the sklearn.feature_selection module can be used for feature selection/dimensionality reduction on sample sets, either to improve estimators’ accuracy scores or to boost their performance on very high-dimensional datasets.

3.11.1. Univariate feature selection

Univariate feature selection works by selecting the best features based on univariate statistical tests. It can seen as a preprocessing step to an estimator. Scikit-Learn exposes feature selection routines a objects that implement the transform method:

These objects take as input a scoring function that returns univariate p-values:

Feature selection with sparse data

If you use sparse data (i.e. data represented as sparse matrices), only chi2 will deal with the data without making it dense.

Warning

Beware not to use a regression scoring function with a classification problem, you will get useless results.

3.11.2. Recursive feature elimination

Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and weights are assigned to each one of them. Then, features whose absolute weights are the smallest are pruned from the current set features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.

Examples:

3.11.3. L1-based feature selection

Linear models penalized with the L1 norm have sparse solutions. When the goal is to reduce the dimensionality of the data to use with another classifier, the transform method of LogisticRegression and LinearSVC can be used:

>>> from sklearn.svm import LinearSVC
>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
>>> X, y = iris.data, iris.target
>>> X.shape
(150, 4)
>>> X_new = LinearSVC(C=1, penalty="l1", dual=False).fit_transform(X, y)
>>> X_new.shape
(150, 2)

The parameter C controls the sparsity: the smaller the fewer features.

Examples:

3.11.4. Tree-based feature selection

Tree-based estimators (see the sklearn.tree module and forest of trees in the sklearn.ensemble module) can be used to compute feature importances, which in turn can be used to discard irrelevant features:

>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
>>> X, y = iris.data, iris.target
>>> X.shape
(150, 4)
>>> clf = ExtraTreesClassifier(compute_importances=True, random_state=0)
>>> X_new = clf.fit(X, y).transform(X)
>>> X_new.shape
(150, 2)

Examples: