This page

7. Class reference

7.1. Support Vector Machines

Support Vector Machine algorithms.

svm.SVC([C, kernel, degree, gamma, coef0, ...]) C-Support Vector Classification.
svm.LinearSVC([penalty, loss, dual, tol, C, ...]) Linear Support Vector Classification.
svm.NuSVC([nu, kernel, degree, gamma, ...]) Nu-Support Vector Classification.
svm.SVR([kernel, degree, gamma, coef0, ...]) epsilon-Support Vector Regression.
svm.NuSVR([nu, C, kernel, degree, gamma, ...]) Nu Support Vector Regression.
svm.OneClassSVM([kernel, degree, gamma, ...]) Unsupervised Outliers Detection.
svm.l1_min_c(X, y[, loss, fit_intercept, ...]) Return the maximum value for C that yields a model with coefficients

7.1.1. For sparse data

Support Vector Machine algorithms for sparse matrices.

This module should have the same API as scikits.learn.svm, except that matrices are expected to be in some sparse format supported by scipy.sparse.

Note

Some fields, like dual_coef_ are not sparse matrices strictly speaking. However, they are converted to a sparse matrix for consistency and efficiency when multiplying to other sparse matrices.

svm.sparse.SVC([C, kernel, degree, gamma, ...]) SVC for sparse matrices (csr).
svm.sparse.NuSVC([nu, kernel, degree, ...]) NuSVC for sparse matrices (csr).
svm.sparse.SVR([kernel, degree, gamma, ...]) SVR for sparse matrices (csr)
svm.sparse.NuSVR([nu, C, kernel, degree, ...]) NuSVR for sparse matrices (csr)
svm.sparse.OneClassSVM([kernel, degree, ...]) NuSVR for sparse matrices (csr)
svm.sparse.LinearSVC([penalty, loss, dual, ...]) Linear Support Vector Classification, Sparse Version

7.1.2. Low-level methods

svm.libsvm.fit Train the model using libsvm (low-level method)
svm.libsvm.decision_function Predict margin (libsvm name for this is predict_values)
svm.libsvm.predict Predict target values of X given a model (low-level method)
svm.libsvm.predict_proba Predict probabilities svm_model stores all parameters needed to predict a given value.
svm.libsvm.cross_validation Binding of the cross-validation routine (low-level routine)

7.2. Generalized Linear Models

scikits.learn.linear_model is a module to fit genelarized linear models. It includes Ridge regression, Bayesian Regression, Lasso and Elastic Net estimators computed with Least Angle Regression and coordinate descent.

It also implements Stochastic Gradient Descent related algorithms.

linear_model.LinearRegression([fit_intercept]) Ordinary least squares Linear Regression.
linear_model.Ridge([alpha, fit_intercept]) Ridge regression.
linear_model.RidgeCV([alphas, ...]) Ridge regression with built-in cross-validation.
linear_model.Lasso([alpha, fit_intercept]) Linear Model trained with L1 prior as regularizer (aka the Lasso)
linear_model.LassoCV([eps, n_alphas, ...]) Lasso linear model with iterative fitting along a regularization path
linear_model.ElasticNet([alpha, rho, ...]) Linear Model trained with L1 and L2 prior as regularizer
linear_model.ElasticNetCV([rho, eps, ...]) Elastic Net model with iterative fitting along a regularization path
linear_model.LARS([fit_intercept, verbose]) Least Angle Regression model a.k.a. LAR
linear_model.LassoLARS([alpha, ...]) Lasso model fit with Least Angle Regression a.k.a. LARS
linear_model.LogisticRegression([penalty, ...]) Logistic Regression.
linear_model.SGDClassifier([loss, penalty, ...]) Linear model fitted by minimizing a regularized empirical loss with SGD.
linear_model.SGDRegressor([loss, penalty, ...]) Linear model fitted by minimizing a regularized empirical loss with SGD
linear_model.BayesianRidge([n_iter, eps, ...]) Bayesian ridge regression
linear_model.ARDRegression([n_iter, eps, ...]) Bayesian ARD regression.
linear_model.lasso_path(X, y, **fit_params) Compute Lasso path with coordinate descent
linear_model.lars_path(X, y[, Xy, Gram, ...]) Compute Least Angle Regression and LASSO path

7.2.1. For sparse data

scikits.learn.linear_model.sparse is the sparse counterpart of scikits.learn.linear_model.

linear_model.sparse.Lasso([alpha, fit_intercept]) Linear Model trained with L1 prior as regularizer
linear_model.sparse.ElasticNet([alpha, rho, ...]) Linear Model trained with L1 and L2 prior as regularizer
linear_model.sparse.SGDClassifier([loss, ...]) Linear model fitted by minimizing a regularized empirical loss with SGD
linear_model.sparse.SGDRegressor([loss, ...]) Linear model fitted by minimizing a regularized empirical loss with SGD

7.3. Naive Bayes

Naives Bayes classifiers.

naive_bayes.GNB Gaussian Naive Bayes (GNB)

7.4. Nearest Neighbors

Nearest Neighbor related algorithms

neighbors.NeighborsClassifier([n_neighbors, ...]) Classifier implementing k-Nearest Neighbor Algorithm.
neighbors.NeighborsRegressor([n_neighbors, ...]) Regression based on k-Nearest Neighbor Algorithm
ball_tree.BallTree
neighbors.kneighbors_graph(X, n_neighbors[, ...]) Computes the (weighted) graph of k-Neighbors for points in X

7.5. Gaussian Mixture Models

Gaussian Mixture Models

mixture.GMM([n_states, cvtype]) Gaussian Mixture Model

7.6. Hidden Markov Models

hmm.GaussianHMM([n_states, cvtype, ...]) Hidden Markov Model with Gaussian emissions
hmm.MultinomialHMM([n_states, startprob, ...]) Hidden Markov Model with multinomial (discrete) emissions
hmm.GMMHMM([n_states, n_mix, startprob, ...]) Hidden Markov Model with Gaussin mixture emissions

7.7. Clustering

Clustering algorithms

cluster.KMeans([k, init, n_init, max_iter, ...]) K-Means clustering
cluster.MeanShift([bandwidth]) MeanShift clustering
cluster.SpectralClustering([k, mode, ...]) Apply k-means to a projection to the normalized laplacian
cluster.AffinityPropagation([damping, ...]) Perform Affinity Propagation Clustering of data
cluster.Ward([n_clusters, memory, ...]) Ward hierarchical clustering: constructs a tree and cuts it.

7.8. Metrics

Metrics module with score functions, performance metrics and pairwise metrics or distances computation

metrics.euclidean_distances(X, Y[, ...]) Considering the rows of X (and Y=X) as vectors, compute the
metrics.confusion_matrix(y_true, y_pred[, ...]) Compute confusion matrix to evaluate the accuracy of a classification
metrics.roc_curve(y_true, y_score) compute Receiver operating characteristic (ROC)
metrics.auc(x, y) Compute Area Under the Curve (AUC) using the trapezoidal rule
metrics.precision_score(y_true, y_pred[, ...]) Compute the precision
metrics.recall_score(y_true, y_pred[, pos_label]) Compute the recall
metrics.fbeta_score(y_true, y_pred, beta[, ...]) Compute fbeta score
metrics.f1_score(y_true, y_pred[, pos_label]) Compute f1 score
metrics.precision_recall_fscore_support(...) Compute precisions, recalls, f-measures and support for each class
metrics.classification_report(y_true, y_pred) Build a text report showing the main classification metrics
metrics.precision_recall_curve(y_true, ...) Compute precision-recall pairs for different probability thresholds
metrics.r2_score(y_true, y_pred) R^2 (coefficient of determination) regression score function
metrics.zero_one_score(y_true, y_pred) Zero-One classification score
metrics.zero_one(y_true, y_pred) Zero-One classification loss
metrics.mean_square_error(y_true, y_pred) Mean square error regression loss

7.8.1. Pairwise metrics

Utilities to evaluate pairwise distances or metrics between 2 sets of points.

metrics.pairwise.euclidean_distances(X, Y[, ...]) Considering the rows of X (and Y=X) as vectors, compute the
metrics.pairwise.linear_kernel(X, Y) Compute the linear kernel between X and Y.
metrics.pairwise.polynomial_kernel(X, Y[, ...]) Compute the polynomial kernel between X and Y.
metrics.pairwise.rbf_kernel(X, Y[, sigma]) Compute the rbf (gaussian) kernel between X and Y.

7.9. Covariance Estimators

7.9.1. Covariance estimators

scikits.learn.covariance is a module to fit to estimate robustly the covariance of features given a set of points. The precision matrix defined as the inverse of the covariance is also estimated. Covariance estimation is closely related to the theory of Gaussian Graphical Models.

covariance.Covariance
covariance.ShrunkCovariance([...]) Covariance estimator with shrinkage
covariance.LedoitWolf([store_precision]) LedoitWolf Estimator
covariance.ledoit_wolf(X[, assume_centered]) Estimates the shrunk Ledoit-Wolf covariance matrix.
covariance.shrunk_covariance(emp_cov[, ...]) Calculates a covariance matrix shrunk on the diagonal
covariance.oas(X[, assume_centered]) Estimate covariance with the Oracle Approximating Shrinkage algorithm.

7.10. Signal Decomposition

Matrix decomposition algorithms

decomposition.PCA([n_components, copy, whiten]) Principal component analysis (PCA)
decomposition.ProbabilisticPCA([...]) Additional layer on top of PCA that adds a probabilistic evaluation
decomposition.RandomizedPCA(n_components[, ...]) Principal component analysis (PCA) using randomized SVD
decomposition.KernelPCA([n_components, ...]) Kernel Principal component analysis (KPCA)
decomposition.FastICA([n_components, ...]) FastICA; a fast algorithm for Independent Component Analysis
decomposition.NMF([n_components, init, ...]) Non-Negative matrix factorization by Projected Gradient (NMF)
decomposition.fastica(X[, n_components, ...]) Perform Fast Independent Component Analysis.

7.11. Linear Discriminant Analysis

lda.LDA([n_components, priors]) Linear Discriminant Analysis (LDA)

7.12. Cross Validation

Utilities for cross validation and performance evaluation

cross_val.LeaveOneOut(n[, indices]) Leave-One-Out cross validation iterator
cross_val.LeavePOut(n, p[, indices]) Leave-P-Out cross validation iterator
cross_val.KFold(n, k[, indices]) K-Folds cross validation iterator
cross_val.StratifiedKFold(y, k[, indices]) Stratified K-Folds cross validation iterator
cross_val.LeaveOneLabelOut(labels[, indices]) Leave-One-Label_Out cross-validation iterator
cross_val.LeavePLabelOut(labels, p[, indices]) Leave-P-Label_Out cross-validation iterator

7.14. Feature Selection

Feature slection module for python

feature_selection.rfe.RFE([estimator, ...]) Feature ranking with Recursive feature elimination
feature_selection.rfe.RFECV([estimator, ...]) Feature ranking with Recursive feature elimination and cross validation

7.15. Feature Extraction

Package for modules that deal with feature extraction from raw data

7.15.1. From images

Utilities to extract features from images.

feature_extraction.image.img_to_graph(img[, ...]) Graph of the pixel-to-pixel gradient connections
feature_extraction.image.grid_to_graph(n_x, n_y) Graph of the pixel-to-pixel connections

7.15.2. From text

Utilities to build dense feature vectors from text documents

feature_extraction.text.RomanPreprocessor Fast preprocessor suitable for roman languages ..
feature_extraction.text.WordNGramAnalyzer([...]) Simple analyzer: transform a text document into a sequence of word tokens
feature_extraction.text.CharNGramAnalyzer([...]) Compute character n-grams features of a text document
feature_extraction.text.CountVectorizer([...]) Convert a collection of raw documents to a matrix of token counts
feature_extraction.text.TfidfTransformer([...]) Transform a count matrix to a TF or TF-IDF representation
feature_extraction.text.Vectorizer([...]) Convert a collection of raw documents to a matrix

7.16. Pipeline

Pipeline: chain transforms and estimators to build a composite estimator.

pipeline.Pipeline(steps) Pipeline of transforms with a final estimator

7.17. Partial Least Squares

Partial Least Square

pls.PLSRegression([n_components, scale, ...]) PLS regression (Also known PLS2 or PLS in case of one dimensional
pls.PLSCanonical([n_components, scale, ...]) PLS canonical. PLSCanonical inherits from PLS with mode=”A” and
pls.CCA([n_components, scale, algorithm, ...]) CCA Canonical Correlation Analysis. CCA inherits from PLS with
pls.PLSSVD([n_components, scale, copy]) Partial Least Square SVD