Perceptron for Classification

Goal: to find a linear boundary that separate the different classes. It’s like SVM

Only works on linearly separable data

Theorem: if it’s linearly separable, it will converge to a weighted vector that separate the data

Perceptron Learning Algorithm

Perceptron learning algorithm is an iterative method that processes one misclassified point at the time, until there are no misclassified points

For linear separable data, perceptron eventually finds a decision boundary

Interpretation: when there still exists misclassified point, tilt $w^{(k)}$ toward the right decision rule

Algorithm

Initialize:

set k = 1

set the vector $w^{(k)}$ to be the all zero vector

While there exist misclassified point, such that there exists index j such that

$y_j(w^{(k)T}*x_j) < 0$

Update: $w^{(k+1)} = w^{(k)} + y_jx_j$

return $w^{(k)}$

def train_perception(
	dataset: list(tuple(data, label)),
	max_iteration: int,
	dimension: int
):
	weight = [0] * dimension
	bias = 0
	for i in range(len(max_iteration)):
		for (data, label) in dataset:
			a = innerproduct(data) + bias
			if label * a <= 0:
				for i in range(len(weight)):
					weight[i] += label * x[i]
	return weight, bias

At test time

Simply compute the inner product and evaluate the sign

This algorithm can also be cast as an optimization problem where we seek to minimize a particular loss function

$$ L = -\sum_iw^Tx_iy_i $$

where $y_i \in \{\pm1\}$

If all points are correctly classified, this function is negative

If all points are incorrectly classified, this function is positive

To minimize the function, we can either take derivative or do gradient descent.

Improved Generality

Since the perception learning will only update on mistake, this following scenario might happen: