# Perceptron - PowerPoint PPT Presentation

1 / 38

Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to gradient descent LMS Adaline. Inner-product. A measure of the projection of one vector onto another. Example. Activation function.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Perceptron

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Perceptron

• Inner-product scalar

• Perceptron

• Perceptron learning rule

• XOR problem

• linear separable patterns

• Stochastic Approximation to gradient descent

### Inner-product

• A measure of the projection of one vector onto another

sigmoid function

### Similarity to real neurons...

• 1040 Neurons

• 104-5 connections

per neuron

x1

x2

xn

### Perceptron

• Linear treshold unit (LTU)

x0=1

w1

w0

w2

o

.

.

.

wn

McCulloch-Pitts model of a neuron

• The goal of a perceptron is to correctly classify the set of pattern D={x1,x2,..xm} into one of the classes C1 and C2

• The output for class C1 is o=1 and fo C2 is o=-1

• For n=2 

### Perceptron learning rule

• Consider linearly separable problems

• How to find appropriate weights

• Look if the output pattern o belongs to the desired class, has the desired value d

•  is called the learning rate

• 0 <  ≤ 1

• In supervised learning the network has ist output compared with known correct answers

• Supervised learning

• Learning with a teacher

• (d-o) plays the role of the error signal

### Perceptron

• The algorithm converges to the correct classification

• if the training data is linearly separable

• and  is sufficiently small

• When assigning a value to  we must keep in mind two conflicting requirements

• Averaging of past inputs to provide stable weights estimates, which requires small 

• Fast adaptation with respect to real changes in the underlying distribution of the process responsible for the generation of the input vector x, which requires large 

### Frank Rosenblatt

• 1928-1969

• Rosenblatt's bitter rival and professional nemesis was Marvin Minsky of Carnegie Mellon University

• Minsky despised Rosenblatt, hated the concept of the perceptron, and wrote several polemics against him

• For years Minsky crusaded against Rosenblatt on a very nasty and personal level, including contacting every group who funded Rosenblatt's research to denounce him as a charlatan, hoping to ruin Rosenblatt professionally and to cut off all funding for his research in neural nets

### XOR problem and Perceptron

• By Minsky and Papert in mid 1960

• To understand, consider simpler linear unit, where

• Let's learn wi that minimize the squared error, D={(x1,t1),(x2,t2), . .,(xd,td),..,(xm,tm)}

• (t for target)

### Error for different hypothesis, for w0 and w1 (dim 2)

• We want to move the weigth vector in the direction that decrease E

wi=wi+wiw=w+w

### Update rule for gradient decent

Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is the vector of input values, and t is the target output value,  is the learning rate (e.g. 0.1)

• Initialize each wi to some small random value

• Until the termination condition is met,

• Do

• Initialize each wi to zero

• For each <(x1,…xn),t> in training_examples

• Do

• Input the instance (x1,…,xn) to the linear unit and compute the output o

• For each linear unit weight wi

• Do

• wi= wi +  (t-o) xi

• For each linear unit weight wi

• Do

• wi=wi+wi

### Stochastic Approximation to gradient descent

• The gradient decent training rule updates summing over all the training examples D

• Calculate error for each example

• Known as delta-rule or LMS (last mean-square) weight update

### LMS

• Estimate of the weight vector

• No steepest decent

• No well defined trajectory in the weight space

• Converge only asymptotically toward the minimum error

• Can approximate gradient descent arbitrarily closely if made small enough 

### Summary

• Perceptron training rule guaranteed to succeed if

• Training examples are linearly separable

• Sufficiently small learning rate 

• Linear unit training rule uses gradient descent or LMS guaranteed to converge to hypothesis with minimum squared error

• Given sufficiently small learning rate 

• Even when training data contains noise

• Even when training data not separable by H

• Inner-product scalar

• Perceptron

• Perceptron learning rule

• XOR problem

• linear separable patterns