This documentation is for scikit-learn version 0.10Other versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.5.4. sklearn.decomposition.RandomizedPCA

class sklearn.decomposition.RandomizedPCA(n_components, copy=True, iterated_power=3, whiten=False, random_state=None)

Principal component analysis (PCA) using randomized SVD

Linear dimensionality reduction using approximated Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.

This implementation uses a randomized SVD implementation and can handle both scipy.sparse and numpy dense arrays as input.

Parameters :

n_components : int

Maximum number of components to keep: default is 50.

copy : bool

If False, data passed to fit are overwritten

iterated_power : int, optional

Number of iteration for the power method. 3 by default.

whiten : bool, optional

When True (False by default) the components_ vectors are divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making there data respect some hard-wired assumptions.

random_state : int or RandomState instance or None (default)

Pseudo Random Number generator seed control. If None, use the numpy.random singleton.

See also

PCA, ProbabilisticPCA

Notes

References:

[Halko2009]Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 (arXiv:909)
[MRT]A randomized algorithm for the decomposition of matrices Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert

Examples

>>> import numpy as np
>>> from sklearn.decomposition import RandomizedPCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = RandomizedPCA(n_components=2)
>>> pca.fit(X)                 
RandomizedPCA(copy=True, iterated_power=3, n_components=2,
       random_state=<mtrand.RandomState object at 0x...>, whiten=False)
>>> print pca.explained_variance_ratio_ 
[ 0.99244...  0.00755...]

Attributes

components_ array, [n_components, n_features] Components with maximum variance.
explained_variance_ratio_ array, [n_components] Percentage of variance explained by each of the selected components. k is not set then all components are stored and the sum of explained variances is equal to 1.0

Methods

fit(X[, y]) Fit the model to the data X.
fit_transform(X[, y]) Fit to data, then transform it
inverse_transform(X) Transform data back to its original space, i.e.,
set_params(**params) Set the parameters of the estimator.
transform(X) Apply the dimensionality reduction on X.
__init__(n_components, copy=True, iterated_power=3, whiten=False, random_state=None)
fit(X, y=None)

Fit the model to the data X.

Parameters :

X: array-like or scipy.sparse matrix, shape (n_samples, n_features) :

Training vector, where n_samples in the number of samples and n_features is the number of features.

Returns :

self : object

Returns the instance itself.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters :

X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns :

X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

Notes

This method just calls fit and transform consecutively, i.e., it is not an optimized implementation of fit_transform, unlike other transformers such as PCA.

inverse_transform(X)

Transform data back to its original space, i.e., return an input X_original whose transform would be X

Parameters :

X : array-like or scipy.sparse matrix, shape (n_samples, n_components)

New data, where n_samples in the number of samples and n_components is the number of components.

Returns :

X_original array-like, shape (n_samples, n_features) :

Notes

If whitening is enabled, inverse_transform does not compute the exact inverse operation as transform.

set_params(**params)

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :self :
transform(X)

Apply the dimensionality reduction on X.

Parameters :

X : array-like or scipy.sparse matrix, shape (n_samples, n_features)

New data, where n_samples in the number of samples and n_features is the number of features.

Returns :

X_new : array-like, shape (n_samples, n_components)