9.17.3. sklearn.preprocessing.Binarizer¶

class sklearn.preprocessing.Binarizer(threshold=0.0, copy=True)¶

Binarize data (set feature values to 0 or 1) according to a threshold

The default threshold is 0.0 so that any non-zero values are set to 1.0 and zeros are left untouched.

Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurences for instance.

It can also be used as a pre-processing step for estimators that consider boolean random variables (e.g. modeled using the Bernoulli distribution in a Bayesian setting).

Parameters :

threshold : float, optional (0.0 by default)

The lower bound that triggers feature values to be replaced by 1.0.

copy : boolean, optional, default is True

set to False to perform inplace binarization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).

Notes

If the input is a sparse matrix, only the non-zero values are subject to update by the Binarizer class.

This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.

Methods

`fit`(X[, y])	Do nothing and return the estimator unchanged
`fit_transform`(X[, y])	Fit to data, then transform it
`set_params`(**params)	Set the parameters of the estimator.
`transform`(X[, y, copy])	Binarize each element of X

__init__(threshold=0.0, copy=True)¶

fit(X, y=None)¶

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

fit_transform(X, y=None, **fit_params)¶

Fit to data, then transform it

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters :

X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns :

X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

set_params(**params)¶

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns :	self :

transform(X, y=None, copy=None)¶

Binarize each element of X

Parameters :

X : array or scipy.sparse matrix with shape [n_samples, n_features]

The data to binarize, element by element. scipy.sparse matrices should be in CSR format to avoid an un-necessary copy.

This page

Citing

9.17.3. sklearn.preprocessing.Binarizer¶