This documentation is for scikit-learn version 0.11-gitOther versions

Citing

If you use the software, please consider citing scikit-learn.

This page

8.23.3. sklearn.preprocessing.Binarizer

class sklearn.preprocessing.Binarizer(threshold=0.0, copy=True)

Binarize data (set feature values to 0 or 1) according to a threshold

The default threshold is 0.0 so that any non-zero values are set to 1.0 and zeros are left untouched.

Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurences for instance.

It can also be used as a pre-processing step for estimators that consider boolean random variables (e.g. modeled using the Bernoulli distribution in a Bayesian setting).

Parameters :

threshold : float, optional (0.0 by default)

The lower bound that triggers feature values to be replaced by 1.0.

copy : boolean, optional, default is True

set to False to perform inplace binarization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).

Notes

If the input is a sparse matrix, only the non-zero values are subject to update by the Binarizer class.

This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.

Methods

fit(X[, y]) Do nothing and return the estimator unchanged
fit_transform(X[, y]) Fit to data, then transform it
set_params(**params) Set the parameters of the estimator.
transform(X[, y, copy]) Binarize each element of X
__init__(threshold=0.0, copy=True)
fit(X, y=None)

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters :

X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns :

X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

Notes

This method just calls fit and transform consecutively, i.e., it is not an optimized implementation of fit_transform, unlike other transformers such as PCA.

set_params(**params)

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :self :
transform(X, y=None, copy=None)

Binarize each element of X

Parameters :

X : array or scipy.sparse matrix with shape [n_samples, n_features]

The data to binarize, element by element. scipy.sparse matrices should be in CSR format to avoid an un-necessary copy.