Classification

Recall that:

Supervised Learning: at training time, we have access both to inputs and their labeled outputs

Unsupervised Learning: at training time, we have access only to input data, but not their labeled outputs.

Classification vs Regression

Classification and regression are the two main examples of supervised learning

In regression, predict/assign a real value to the test point, based on the input-output relationship in the training data

In classification, determine the value among a finite set of choices that appeared in the training set.

KNN for Classification

Use the other available observations that are most similar to the observation we are trying to predict (classify into a group) based on the predictors at hand.

The category that shows up the most among the nearest neighbors would be the prediction of the category.

Notice that KNN is a non-parametric method.

Distance Measure

In KNN with K=1, we find the closest neighbor in the training set and assign the label of that point as the label of the test point. However, we need to choose a method to measure the distance.

Suppose that the data is in d-dimension, at test time we have

$x^* = \text{argmin}\;d(x_j,z)$ for $x_j \in$ training set.

to avoid the square root, we can just compare the squared norms

Choices of the distance measure

Hamming distance
- $\sum_{i=1}^d I(x_i\ne x_j)$ where $I(y) = 1$ when $y$ is true and 0 when $y$ is false
Manhattan distance
- $\sum_{i=1}^d \lvert x_i- x_j \rvert$

Classification vs Regression

KNN for Classification

Distance Measure

Decision Boundaries