We can normalize the error as follows:
$$ R^2 = 1 - \frac{\sum_{i}(\hat{y_i}-y_i)^2}{\sum_{i}(\bar y-y_i)^2} $$
where $\hat{y}$ is the model’s prediction and $\bar y$ is the mean of the data
Interpretation:
Ratio of how well our model does to how well a simple average does. Quantifies how much of the variability in the data is explained by the model.
$R^2$ can’t be greater than 1, as $R^2 = 1$ is the best result one can get from having a perfect fit .$R^2 = 0$ when every model in the data point is equal to the mean. $R^2$ can be negative when the model is worse than the mean.
Collinearity: when features are correlated with each other ← should be avoided
Whenever we are dealing with a set of data, we split the available data into 2 part and
Then once the model is confirmed, we can use the test data to test it(apart from the train/test data)
Model selection is the application of a principled method to determine the complexity of the model, e.g. choosing a subset of predictors, choosing the degree of the polynomial model etc.
A strong motivation for performing model selection is to avoid overfitting, which can happen when
Generalization error(the out-of-sample error or the risk) is a measure of how accurately an algorithm is able to predict outcome values for previously unseen data.