Contents

4.3. Decomposing signals in components (matrix factorization problems)

4.3.1. Principal component analysis (PCA)

PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance. In the scikit-learn, PCA is implemented as a transformer object that learns n components in its fit method, and can be used on new data to project it on these components.

In addition, the ProbabilisticPCA object provides a probabilistic interpretation of the PCA that can give a likelihood of data based on the amount of variance it explains. As such it implements a score method that can be used in cross-validation.

Below is an example of the iris dataset, which is comprised of 4 features, projected on the 2 dimensions that explain most variance:

4.3.2. Independent component analysis (ICA)

ICA finds components that are maximally independent. It is classically used to separate mixed signals (a problem know as blind source separation), as in the example below: