Perceptron
Sponsored Links
This presentation is the property of its rightful owner.
1 / 38

Perceptron PowerPoint PPT Presentation


  • 308 Views
  • Uploaded on
  • Presentation posted in: General

Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to gradient descent LMS Adaline. Inner-product. A measure of the projection of one vector onto another. Example. Activation function.

Download Presentation

Perceptron

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Perceptron


  • Inner-product scalar

  • Perceptron

    • Perceptron learning rule

  • XOR problem

    • linear separable patterns

  • Gradient descent

  • Stochastic Approximation to gradient descent

    • LMS Adaline


Inner-product

  • A measure of the projection of one vector onto another


Example


Activation function


sigmoid function


Graphical representation


Similarity to real neurons...


  • 1040 Neurons

  • 104-5 connections

    per neuron


x1

x2

xn

Perceptron

  • Linear treshold unit (LTU)

x0=1

w1

w0

w2

o

.

.

.

wn

McCulloch-Pitts model of a neuron


  • The goal of a perceptron is to correctly classify the set of pattern D={x1,x2,..xm} into one of the classes C1 and C2

  • The output for class C1 is o=1 and fo C2 is o=-1

  • For n=2 


Lineary separable patterns


Perceptron learning rule

  • Consider linearly separable problems

  • How to find appropriate weights

  • Look if the output pattern o belongs to the desired class, has the desired value d

  •  is called the learning rate

  • 0 <  ≤ 1


  • In supervised learning the network has ist output compared with known correct answers

    • Supervised learning

    • Learning with a teacher

  • (d-o) plays the role of the error signal


Perceptron

  • The algorithm converges to the correct classification

    • if the training data is linearly separable

    • and  is sufficiently small

  • When assigning a value to  we must keep in mind two conflicting requirements

    • Averaging of past inputs to provide stable weights estimates, which requires small 

    • Fast adaptation with respect to real changes in the underlying distribution of the process responsible for the generation of the input vector x, which requires large 


Several nodes


Constructions


Frank Rosenblatt

  • 1928-1969


  • Rosenblatt's bitter rival and professional nemesis was Marvin Minsky of Carnegie Mellon University

  • Minsky despised Rosenblatt, hated the concept of the perceptron, and wrote several polemics against him

  • For years Minsky crusaded against Rosenblatt on a very nasty and personal level, including contacting every group who funded Rosenblatt's research to denounce him as a charlatan, hoping to ruin Rosenblatt professionally and to cut off all funding for his research in neural nets


XOR problem and Perceptron

  • By Minsky and Papert in mid 1960


Gradient Descent

  • To understand, consider simpler linear unit, where

  • Let's learn wi that minimize the squared error, D={(x1,t1),(x2,t2), . .,(xd,td),..,(xm,tm)}

    • (t for target)


Error for different hypothesis, for w0 and w1 (dim 2)


  • We want to move the weigth vector in the direction that decrease E

    wi=wi+wiw=w+w


  • The gradient


Differentiating E


Update rule for gradient decent


Gradient Descent

Gradient-Descent(training_examples, )

Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is the vector of input values, and t is the target output value,  is the learning rate (e.g. 0.1)

  • Initialize each wi to some small random value

  • Until the termination condition is met,

  • Do

    • Initialize each wi to zero

    • For each <(x1,…xn),t> in training_examples

    • Do

      • Input the instance (x1,…,xn) to the linear unit and compute the output o

      • For each linear unit weight wi

      • Do

        • wi= wi +  (t-o) xi

    • For each linear unit weight wi

    • Do

      • wi=wi+wi


Stochastic Approximation to gradient descent

  • The gradient decent training rule updates summing over all the training examples D

  • Stochastic gradient approximates gradient decent by updating weights incrementally

  • Calculate error for each example

  • Known as delta-rule or LMS (last mean-square) weight update

    • Adaline rule, used for adaptive filters Widroff and Hoff (1960)


LMS

  • Estimate of the weight vector

  • No steepest decent

  • No well defined trajectory in the weight space

  • Instead a random trajectory (stochastic gradient descent)

  • Converge only asymptotically toward the minimum error

  • Can approximate gradient descent arbitrarily closely if made small enough 


Summary

  • Perceptron training rule guaranteed to succeed if

    • Training examples are linearly separable

    • Sufficiently small learning rate 

  • Linear unit training rule uses gradient descent or LMS guaranteed to converge to hypothesis with minimum squared error

    • Given sufficiently small learning rate 

    • Even when training data contains noise

    • Even when training data not separable by H


  • Inner-product scalar

  • Perceptron

    • Perceptron learning rule

  • XOR problem

    • linear separable patterns

  • Gradient descent

  • Stochastic Approximation to gradient descent

    • LMS Adaline


XOR?Multi-Layer Networks

output layer

hidden layer

input layer


  • Login