8.15.1.7. sklearn.linear_model.ElasticNet¶
- class sklearn.linear_model.ElasticNet(alpha=1.0, rho=0.5, fit_intercept=True, normalize=False, precompute='auto', max_iter=1000, copy_X=True, tol=0.0001, warm_start=False)¶
Linear Model trained with L1 and L2 prior as regularizer
Minimizes the objective function:
1 / (2 * n_samples) * ||y - Xw||^2_2 + + alpha * rho * ||w||_1 + 0.5 * alpha * (1 - rho) * ||w||^2_2
If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:
a * L1 + b * L2
where:
alpha = a + b and rho = a / (a + b)
The parameter rho corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. Specifically, rho = 1 is the lasso penalty. Currently, rho <= 0.01 is not reliable, unless you supply your own sequence of alpha.
Parameters : alpha : float
Constant that multiplies the penalty terms. Defaults to 1.0 See the notes for the exact mathematical meaning of this parameter
rho : float
The ElasticNet mixing parameter, with 0 < rho <= 1. For rho = 0 the penalty is an L1 penalty. For rho = 1 it is an L2 penalty. For 0 < rho < 1, the penalty is a combination of L1 and L2
fit_intercept: bool :
Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
normalize : boolean, optional
If True, the regressors X are normalized
precompute : True | False | ‘auto’ | array-like
Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument.
max_iter: int, optional :
The maximum number of iterations
copy_X : boolean, optional, default False
If True, X will be copied; else, it may be overwritten.
tol: float, optional :
The tolerance for the optimization: if the updates are smaller than ‘tol’, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.
warm_start : bool, optional
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
Notes
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a fortran contiguous numpy array.
Methods
decision_function(X) Decision function of the linear model fit(X, y[, Xy, coef_init]) Fit Elastic Net model with coordinate descent get_params([deep]) Get parameters for the estimator predict(X) Predict using the linear model score(X, y) Returns the coefficient of determination R^2 of the prediction. set_params(**params) Set the parameters of the estimator. - __init__(alpha=1.0, rho=0.5, fit_intercept=True, normalize=False, precompute='auto', max_iter=1000, copy_X=True, tol=0.0001, warm_start=False)¶
- decision_function(X)¶
Decision function of the linear model
Parameters : X : numpy array of shape [n_samples, n_features]
Returns : C : array, shape = [n_samples]
Returns predicted values.
- fit(X, y, Xy=None, coef_init=None)¶
Fit Elastic Net model with coordinate descent
Parameters : X: ndarray, (n_samples, n_features) :
Data
y: ndarray, (n_samples) :
Target
Xy : array-like, optional
Xy = np.dot(X.T, y) that can be precomputed. It is useful only when the Gram matrix is precomputed.
coef_init: ndarray of shape n_features :
The initial coeffients to warm-start the optimization
Notes
Coordinate descent is an algorithm that considers each column of data at a time hence it will automatically convert the X input as a fortran contiguous numpy array if necessary.
To avoid memory re-allocation it is advised to allocate the initial data in memory directly using that format.
- get_params(deep=True)¶
Get parameters for the estimator
Parameters : deep: boolean, optional :
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- predict(X)¶
Predict using the linear model
Parameters : X : numpy array of shape [n_samples, n_features]
Returns : C : array, shape = [n_samples]
Returns predicted values.
- score(X, y)¶
Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y - y_pred) ** 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) ** 2).sum(). Best possible score is 1.0, lower values are worse.
Parameters : X : array-like, shape = [n_samples, n_features]
Training set.
y : array-like, shape = [n_samples]
Returns : z : float
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns : self :