This documentation is for scikit-learn version 0.11-gitOther versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.8.8. sklearn.feature_selection.chi2

sklearn.feature_selection.chi2(X, y)

Compute χ² (chi-squared) statistic for each class/feature combination.

This transformer can be used to select the n_features features with the highest values for the χ² (chi-square) statistic from either boolean or multinomially distributed data (e.g., term counts in document classification) relative to the classes.

Recall that the χ² statistic measures dependence between stochastic variables, so a transformer based on this function “weeds out” the features that are the most likely to be independent of class and therefore irrelevant for classification.

Parameters :

X : {array-like, sparse matrix}, shape = [n_samples, n_features_in]

Sample vectors.

y : array-like, shape = n_samples

Target vector (class labels).

Notes

Complexity of this algorithm is O(n_classes * n_features).