Regularization

The idea of regularization is to modify the loss function L. In particular, we add a regularization term that penalizes some specified properties of the model parameters

$$ L_{reg}(\beta)= L(\beta) + \lambda R(\beta) $$

where $\lambda$ is a scalar that gives the weight (or importance) of the regularization term

Fitting the model using the modified loss function $L_{reg}$ would result in model parameters with desirable properties (specified by R)

LASSO Regression

Since we wish to discourage extreme values in model parameter, we need to choose a regularization term that penalizes parameter magnitudes. For our loss function, we will again use MSE

Screen Shot 2022-04-19 at 11.12.41 AM.png

where $\sum_{j=1}^J|\beta_j| = \lVert\beta\rVert_1$ is the $l_1$ norm of the vector $b$ (sum of squares)

Ridge Regression

choose a regularization term that penalizes the squares of the parameter magnitudes

Screen Shot 2022-04-19 at 11.15.15 AM.png

Choosing $\lambda$

In both ridge and LASSO regression, the larger the $\lambda$ is, the more heavily we penalize large values in $\beta$.

If
$\lambda$ is close to zero, we recover the MSE, i.e. ridge and LASSO regression is just ordinary regression.
if
$\lambda$ is sufficiently large, the MSE term in the regularized loss function will be insignificant and the regularization term will force $\beta_{ridge}$ and $\beta_{LASSO}$ to be close to zero.

Thus, we want to choose $\lambda$ using cross-validation

Ridge regularization tends to make all coefficients small