Linear And Polynomial Regression

Recall that in KNN, we did not attempt to create an estimate of the function f
Alternatively, we can model a given problem with an estimate
In the simplest form, we can just form a line to fit the data

Estimate of the regression coefficient

Formula of line: $\hat{Y} = \hat{f}(X) = \hat{\beta_1}X + \hat{\beta_0}$

The goal is to find the regression coefficient. In fact, all training simply amounts to find theses coefficient. Main approach is to find the coefficients that will result in the smallest MSE.

we can replace $\hat y$ with the linear model and use the MSE as the loss function

$$ L(\beta_0,\beta_1)= \frac{1}{n}\sum_{i=1}^n(y_i-\hat{y_i})^2 = \frac{1}{n}\sum_{i=1}^n[y_i-({\beta_1}X + {\beta_0})]^2 $$

Then the optimal values for $\beta_0, \beta_1$ is $\text{argmin}L(\beta_0,\beta_1)$

Optimization methods

Brute force - try all combinations
Exact analytical solution - solve a system of equations

$\hat{\beta_1} = \frac{\sum_i(x_i-\bar x)(y_i-\bar y)}{\sum_i(x_i-\bar x)^2}$, $\hat\beta_0=\bar y - \hat\beta_1\bar x$
Gradient Descent - use gradient to find the maxima

Multilinear Models

Often, we use multiple predictors

$Y=y_1,..., y_n, \;X=X_1,...,X_j$

Screen Shot 2022-04-07 at 11.19.50 AM.png

The model takes a simple algebraic form: $\boldsymbol Y = \boldsymbol X\beta + \epsilon$ where $\boldsymbol Y$ and $\beta$ are vectors and $\boldsymbol X$ is a matrix

Thus, the MSE can be expressed in vector notation as $MSE(\beta) = \frac{1}{n}\lVert\boldsymbol Y - \boldsymbol X\beta \rVert^2$

Minimizing the MSE using vector calculus yields $\boldsymbol{\hat \beta} = (\boldsymbol{X^TX})^{-1}\boldsymbol{X}^T \boldsymbol Y = \underset{\beta}{\text{argmin}}MSE(\boldsymbol\beta)$