Support Vector Machine

In logistic regression, we typically learn on $l_1$ or $l_2$ regularized model to prevent overfitting.

In choosing decision boundary, we also want to avoid overfitting by considering other constraints like by finding one that does not “favor” any class.

Geometry of Decision Boundaries

The decision boundary is defined by an equation in terms of the predictors

A linear boundary is defined by $w^Tx +b = 0$
On one side, $w^Tx +b$ would be great than 0 and on the other side it will be less than 0

w would be the normal vector to the decision surface

Set-up SVM

Consider a point x, decompose it to a form of the basis of the decision boundary

i.e $x = x_\perp + r \frac{w}{\lVert w\rVert}$

where $x_\perp$ lies on the decision surface and the second term is in the direction of w and r is the distance of point x to the decision boundary $\frac{y(x)}{\lVert w\rVert}$, and it can be both positive or negative

To get the unsigned distance, we can simply times the distance by $t_n$ which is either 1 or -1

Optimization Problem

We want to optimize the vector w and the intercept term b to maximize the unsigned distance

$$ \frac{t_n(w^Tx_n+b)}{\lVert w\rVert} $$

of point $x_n$

we want to find the smallest distance to the decision boundary over all N points:

so the optimization problem is

$$ \underset{w,b}{{argmax}}\frac{1}{\lVert w\rVert}\underset{n}{min}(t_n(w^Tx_n+b)) $$