Perceptron
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

Perceptron PowerPoint PPT Presentation


  • 267 Views
  • Uploaded on
  • Presentation posted in: General

Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to gradient descent LMS Adaline. Inner-product. A measure of the projection of one vector onto another. Example. Activation function.

Download Presentation

Perceptron

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Perceptron

Perceptron


Perceptron

  • Inner-product scalar

  • Perceptron

    • Perceptron learning rule

  • XOR problem

    • linear separable patterns

  • Gradient descent

  • Stochastic Approximation to gradient descent

    • LMS Adaline


Inner product

Inner-product

  • A measure of the projection of one vector onto another


Example

Example


Activation function

Activation function


Perceptron

sigmoid function


Graphical representation

Graphical representation


Similarity to real neurons

Similarity to real neurons...


Perceptron

  • 1040 Neurons

  • 104-5 connections

    per neuron


Perceptron1

x1

x2

xn

Perceptron

  • Linear treshold unit (LTU)

x0=1

w1

w0

w2

o

.

.

.

wn

McCulloch-Pitts model of a neuron


Perceptron

  • The goal of a perceptron is to correctly classify the set of pattern D={x1,x2,..xm} into one of the classes C1 and C2

  • The output for class C1 is o=1 and fo C2 is o=-1

  • For n=2 


Lineary separable patterns

Lineary separable patterns


Perceptron learning rule

Perceptron learning rule

  • Consider linearly separable problems

  • How to find appropriate weights

  • Look if the output pattern o belongs to the desired class, has the desired value d

  •  is called the learning rate

  • 0 <  ≤ 1


Perceptron

  • In supervised learning the network has ist output compared with known correct answers

    • Supervised learning

    • Learning with a teacher

  • (d-o) plays the role of the error signal


Perceptron2

Perceptron

  • The algorithm converges to the correct classification

    • if the training data is linearly separable

    • and  is sufficiently small

  • When assigning a value to  we must keep in mind two conflicting requirements

    • Averaging of past inputs to provide stable weights estimates, which requires small 

    • Fast adaptation with respect to real changes in the underlying distribution of the process responsible for the generation of the input vector x, which requires large 


Several nodes

Several nodes


Constructions

Constructions


Frank rosenblatt

Frank Rosenblatt

  • 1928-1969


Perceptron

  • Rosenblatt's bitter rival and professional nemesis was Marvin Minsky of Carnegie Mellon University

  • Minsky despised Rosenblatt, hated the concept of the perceptron, and wrote several polemics against him

  • For years Minsky crusaded against Rosenblatt on a very nasty and personal level, including contacting every group who funded Rosenblatt's research to denounce him as a charlatan, hoping to ruin Rosenblatt professionally and to cut off all funding for his research in neural nets


Xor problem and perceptron

XOR problem and Perceptron

  • By Minsky and Papert in mid 1960


Gradient descent

Gradient Descent

  • To understand, consider simpler linear unit, where

  • Let's learn wi that minimize the squared error, D={(x1,t1),(x2,t2), . .,(xd,td),..,(xm,tm)}

    • (t for target)


Error for different hypothesis for w 0 and w 1 dim 2

Error for different hypothesis, for w0 and w1 (dim 2)


Perceptron

  • We want to move the weigth vector in the direction that decrease E

    wi=wi+wiw=w+w


Perceptron

  • The gradient


Differentiating e

Differentiating E


Update rule for gradient decent

Update rule for gradient decent


Gradient descent1

Gradient Descent

Gradient-Descent(training_examples, )

Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is the vector of input values, and t is the target output value,  is the learning rate (e.g. 0.1)

  • Initialize each wi to some small random value

  • Until the termination condition is met,

  • Do

    • Initialize each wi to zero

    • For each <(x1,…xn),t> in training_examples

    • Do

      • Input the instance (x1,…,xn) to the linear unit and compute the output o

      • For each linear unit weight wi

      • Do

        • wi= wi +  (t-o) xi

    • For each linear unit weight wi

    • Do

      • wi=wi+wi


Stochastic approximation to gradient descent

Stochastic Approximation to gradient descent

  • The gradient decent training rule updates summing over all the training examples D

  • Stochastic gradient approximates gradient decent by updating weights incrementally

  • Calculate error for each example

  • Known as delta-rule or LMS (last mean-square) weight update

    • Adaline rule, used for adaptive filters Widroff and Hoff (1960)


Perceptron

LMS

  • Estimate of the weight vector

  • No steepest decent

  • No well defined trajectory in the weight space

  • Instead a random trajectory (stochastic gradient descent)

  • Converge only asymptotically toward the minimum error

  • Can approximate gradient descent arbitrarily closely if made small enough 


Summary

Summary

  • Perceptron training rule guaranteed to succeed if

    • Training examples are linearly separable

    • Sufficiently small learning rate 

  • Linear unit training rule uses gradient descent or LMS guaranteed to converge to hypothesis with minimum squared error

    • Given sufficiently small learning rate 

    • Even when training data contains noise

    • Even when training data not separable by H


Perceptron

  • Inner-product scalar

  • Perceptron

    • Perceptron learning rule

  • XOR problem

    • linear separable patterns

  • Gradient descent

  • Stochastic Approximation to gradient descent

    • LMS Adaline


Xor multi layer networks

XOR?Multi-Layer Networks

output layer

hidden layer

input layer


  • Login