The idea of regularization is to modify the loss function L. In particular, we add a regularization term that penalizes some specified properties of the model parameters
$$ L_{reg}(\beta)= L(\beta) + \lambda R(\beta) $$
where $\lambda$ is a scalar that gives the weight (or importance) of the regularization term
Fitting the model using the modified loss function $L_{reg}$ would result in model parameters with desirable properties (specified by R)
Since we wish to discourage extreme values in model parameter, we need to choose a regularization term that penalizes parameter magnitudes. For our loss function, we will again use MSE
where $\sum_{j=1}^J|\beta_j| = \lVert\beta\rVert_1$ is the $l_1$ norm of the vector $b$ (sum of squares)
choose a regularization term that penalizes the squares of the parameter magnitudes
In both ridge and LASSO regression, the larger the $\lambda$ is, the more heavily we penalize large values in $\beta$.
Thus, we want to choose $\lambda$ using cross-validation
Ridge regularization tends to make all coefficients small