Estimate of the regression coefficient

Formula of line: $\hat{Y} = \hat{f}(X) = \hat{\beta_1}X + \hat{\beta_0}$

The goal is to find the regression coefficient. In fact, all training simply amounts to find theses coefficient. Main approach is to find the coefficients that will result in the smallest MSE.

we can replace $\hat y$ with the linear model and use the MSE as the loss function

$$ L(\beta_0,\beta_1)= \frac{1}{n}\sum_{i=1}^n(y_i-\hat{y_i})^2 = \frac{1}{n}\sum_{i=1}^n[y_i-({\beta_1}X + {\beta_0})]^2 $$

Then the optimal values for $\beta_0, \beta_1$ is $\text{argmin}L(\beta_0,\beta_1)$

Optimization methods

Multilinear Models

Often, we use multiple predictors

$Y=y_1,..., y_n, \;X=X_1,...,X_j$

Screen Shot 2022-04-07 at 11.19.50 AM.png

The model takes a simple algebraic form: $\boldsymbol Y = \boldsymbol X\beta + \epsilon$ where $\boldsymbol Y$ and $\beta$ are vectors and $\boldsymbol X$ is a matrix

Thus, the MSE can be expressed in vector notation as $MSE(\beta) = \frac{1}{n}\lVert\boldsymbol Y - \boldsymbol X\beta \rVert^2$

Minimizing the MSE using vector calculus yields $\boldsymbol{\hat \beta} = (\boldsymbol{X^TX})^{-1}\boldsymbol{X}^T \boldsymbol Y = \underset{\beta}{\text{argmin}}MSE(\boldsymbol\beta)$