8.23.3. sklearn.preprocessing.Binarizer¶
- class sklearn.preprocessing.Binarizer(threshold=0.0, copy=True)¶
Binarize data (set feature values to 0 or 1) according to a threshold
The default threshold is 0.0 so that any non-zero values are set to 1.0 and zeros are left untouched.
Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurences for instance.
It can also be used as a pre-processing step for estimators that consider boolean random variables (e.g. modeled using the Bernoulli distribution in a Bayesian setting).
Parameters : threshold : float, optional (0.0 by default)
The lower bound that triggers feature values to be replaced by 1.0.
copy : boolean, optional, default is True
set to False to perform inplace binarization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).
Notes
If the input is a sparse matrix, only the non-zero values are subject to update by the Binarizer class.
This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.
Methods
fit(X[, y]) Do nothing and return the estimator unchanged fit_transform(X[, y]) Fit to data, then transform it set_params(**params) Set the parameters of the estimator. transform(X[, y, copy]) Binarize each element of X - __init__(threshold=0.0, copy=True)¶
- fit(X, y=None)¶
Do nothing and return the estimator unchanged
This method is just there to implement the usual API and hence work in pipelines.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters : X : numpy array of shape [n_samples, n_features]
Training set.
y : numpy array of shape [n_samples]
Target values.
Returns : X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
Notes
This method just calls fit and transform consecutively, i.e., it is not an optimized implementation of fit_transform, unlike other transformers such as PCA.
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns : self :
- transform(X, y=None, copy=None)¶
Binarize each element of X
Parameters : X : array or scipy.sparse matrix with shape [n_samples, n_features]
The data to binarize, element by element. scipy.sparse matrices should be in CSR format to avoid an un-necessary copy.