Goal: to find a linear boundary that separate the different classes. It’s like SVM
Theorem: if it’s linearly separable, it will converge to a weighted vector that separate the data
Perceptron learning algorithm is an iterative method that processes one misclassified point at the time, until there are no misclassified points
For linear separable data, perceptron eventually finds a decision boundary
Interpretation: when there still exists misclassified point, tilt $w^{(k)}$ toward the right decision rule
Initialize:
set k = 1
set the vector $w^{(k)}$ to be the all zero vector
While there exist misclassified point, such that there exists index j such that
$y_j(w^{(k)T}*x_j) < 0$
Update: $w^{(k+1)} = w^{(k)} + y_jx_j$
return $w^{(k)}$
def train_perception(
dataset: list(tuple(data, label)),
max_iteration: int,
dimension: int
):
weight = [0] * dimension
bias = 0
for i in range(len(max_iteration)):
for (data, label) in dataset:
a = innerproduct(data) + bias
if label * a <= 0:
for i in range(len(weight)):
weight[i] += label * x[i]
return weight, bias
This algorithm can also be cast as an optimization problem where we seek to minimize a particular loss function
$$ L = -\sum_iw^Tx_iy_i $$
where $y_i \in \{\pm1\}$
If all points are correctly classified, this function is negative
If all points are incorrectly classified, this function is positive
To minimize the function, we can either take derivative or do gradient descent.
Since the perception learning will only update on mistake, this following scenario might happen: