9.17.4. sklearn.preprocessing.LabelBinarizer¶
- class sklearn.preprocessing.LabelBinarizer¶
Binarize labels in a one-vs-all fashion
Several regression and binary classification algorithms are available in the scikit. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.
At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.
At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.
Examples
>>> from sklearn import preprocessing >>> clf = preprocessing.LabelBinarizer() >>> clf.fit([1, 2, 6, 4, 2]) LabelBinarizer() >>> clf.classes_ array([1, 2, 4, 6]) >>> clf.transform([1, 6]) array([[ 1., 0., 0., 0.], [ 0., 0., 0., 1.]])
>>> clf.fit_transform([(1, 2), (3,)]) array([[ 1., 1., 0.], [ 0., 0., 1.]]) >>> clf.classes_ array([1, 2, 3])
Attributes
classes_ array of shape [n_class] Holds the label for each class. Methods
fit(y) Fit label binarizer fit_transform(X[, y]) Fit to data, then transform it inverse_transform(Y[, threshold]) Transform binary labels back to multi-class labels set_params(**params) Set the parameters of the estimator. transform(y) Transform multi-class labels to binary labels - __init__()¶
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
- fit(y)¶
Fit label binarizer
Parameters : y : numpy array of shape [n_samples] or sequence of sequences
Target values. In the multilabel case the nested sequences can have variable lengths.
Returns : self : returns an instance of self.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters : X : numpy array of shape [n_samples, n_features]
Training set.
y : numpy array of shape [n_samples]
Target values.
Returns : X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
- inverse_transform(Y, threshold=0)¶
Transform binary labels back to multi-class labels
Parameters : Y : numpy array of shape [n_samples, n_classes]
Target values.
threshold : float
Threshold used to decide whether to assign the positive class or the negative class in the binary case. Use 0.5 when Y contains probabilities.
Returns : y : numpy array of shape [n_samples] or sequence of sequences
Target values. In the multilabel case the nested sequences can have variable lengths.
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns : self :
- transform(y)¶
Transform multi-class labels to binary labels
The output of transform is sometimes referred to by some authors as the 1-of-K coding scheme.
Parameters : y : numpy array of shape [n_samples] or sequence of sequences
Target values. In the multilabel case the nested sequences can have variable lengths.
Returns : Y : numpy array of shape [n_samples, n_classes]