Pattern Classification And Single Layer Networks: Chapter 2

Intro

We have just seen how a network can be trained to perform linear regression. That is, given a set of inputs (x) and output/target values (y), the network finds the best linear mapping from x to y.

Given an x value that we have not seen, our trained network can predict what the most likely y value will be. The ability to (correctly) predict the output for an input the network has not seen is called generalization.

This style of learning is referred to as supervised learning (or learning with a teacher) because we are given the target values. Later we will see examples of unsupervised learning which is used for finding patterns in the data rather than modeling input/output mappings.

We now step away from linear regression for a moment and look at another type of supervised learning problem called pattern classification. We start by considering only single layer networks.

Pattern classification

A classic example of pattern classifiction is letter recognition. We are given, for example, a set of pixel values associated with an image of a letter. We want the computer to determine what letter it is. The pixel values are refered to as the inputs or the decision variables, and the letter categories are referred to as classes.

Now, a given letter such as "A" can look quite different depending on the type of font that is used or, in the case of handwritten letters, different people's handwriting. Thus, there will be a range of values for the decision variables that map to the same class. That is, if we plot the values of the decision variables, different regions will correspond to different classes.

Example 1:

Two Classes (class 0 and class 1), Two Inputs (x1 and x2).

Example 2:

Another example (see data description, data, Maple plots):
class = types of iris
decision variables = sepal and petal sizes

Example 3:

example of zipcode digits in Maple

Single layer Networks for Pattern Classification

We can apply a similar approach as in linear regression where the targets are now the classes. Note that the outputs are no longer continuous but rather take on discrete values.

Two Classes:

What does the network look like? If there are just 2 classes we only need 1 output node. The target is 1 if the example is in, say, class 1, and the target is 0 (or -1) if the target is in class 0. It seems reasonable that we use a binary step function to guarantee an appropriate output value.

Training Methods:

We will discuss two kinds of methods for training single-layer networks that do pattern classification:

Perceptron - guaranteed to find the right weights if they exist
The Adaline (uses Delta Rule) - can easily be generalized to multi-layer nets (nonlinear problems)

But how do we know if the right weights exist at all????

Let's look to see what a single layer architecture can do ....

Single Layer with a Binary Step Function

Consider a network with 2 inputs and 1 output node (2 classes).

The net output of the network is a linear function of the weights and the inputs

net = W X = x1 w1 + x2 w2
y = f(net)

x1 w1 + x2 w2 = 0 defines a straight line through the input space.

x2 = - w1/w2 x1 <- this is line through the origin with slope -w1/w2

Bias

What if the line dividing the 2 classes does not go through the origin?

Other interesting geometric points to note:

The weight vector (w1, w2) is normal to the decision boundary.
Proof: Suppose z1 and z2 are points on the decision boundary.

Linear Separability

Classification problems for which there is a line that exactly separates the classes are called linearly separable. Single layer networks are only able to solve linearly separable problems. Most real world are not linearly separable.