- By
**nuri** - Follow User

- 382 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Perceptron' - nuri

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Inner-product scalar

- Perceptron
- Perceptron learning rule
- XOR problem
- linear separable patterns
- Gradient descent
- Stochastic Approximation to gradient descent
- LMS Adaline

Inner-product

- A measure of the projection of one vector onto another

The goal of a perceptron is to correctly classify the set of pattern D={x1,x2,..xm} into one of the classes C1 and C2

- The output for class C1 is o=1 and fo C2 is o=-1
- For n=2

Perceptron learning rule

- Consider linearly separable problems
- How to find appropriate weights
- Look if the output pattern o belongs to the desired class, has the desired value d
- is called the learning rate
- 0 < ≤ 1

In supervised learning the network has ist output compared with known correct answers

- Supervised learning
- Learning with a teacher
- (d-o) plays the role of the error signal

Perceptron

- The algorithm converges to the correct classification
- if the training data is linearly separable
- and is sufficiently small
- When assigning a value to we must keep in mind two conflicting requirements
- Averaging of past inputs to provide stable weights estimates, which requires small
- Fast adaptation with respect to real changes in the underlying distribution of the process responsible for the generation of the input vector x, which requires large

Frank Rosenblatt

- 1928-1969

Rosenblatt\'s bitter rival and professional nemesis was Marvin Minsky of Carnegie Mellon University

- Minsky despised Rosenblatt, hated the concept of the perceptron, and wrote several polemics against him
- For years Minsky crusaded against Rosenblatt on a very nasty and personal level, including contacting every group who funded Rosenblatt\'s research to denounce him as a charlatan, hoping to ruin Rosenblatt professionally and to cut off all funding for his research in neural nets

XOR problem and Perceptron

- By Minsky and Papert in mid 1960

Gradient Descent

- To understand, consider simpler linear unit, where
- Let\'s learn wi that minimize the squared error, D={(x1,t1),(x2,t2), . .,(xd,td),..,(xm,tm)}
- (t for target)

We want to move the weigth vector in the direction that decrease E

wi=wi+wi w=w+w

Gradient Descent

Gradient-Descent(training_examples, )

Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is the vector of input values, and t is the target output value, is the learning rate (e.g. 0.1)

- Initialize each wi to some small random value
- Until the termination condition is met,
- Do
- Initialize each wi to zero
- For each <(x1,…xn),t> in training_examples
- Do
- Input the instance (x1,…,xn) to the linear unit and compute the output o
- For each linear unit weight wi
- Do
- wi= wi + (t-o) xi
- For each linear unit weight wi
- Do
- wi=wi+wi

Stochastic Approximation to gradient descent

- The gradient decent training rule updates summing over all the training examples D
- Stochastic gradient approximates gradient decent by updating weights incrementally
- Calculate error for each example
- Known as delta-rule or LMS (last mean-square) weight update
- Adaline rule, used for adaptive filters Widroff and Hoff (1960)

LMS

- Estimate of the weight vector
- No steepest decent
- No well defined trajectory in the weight space
- Instead a random trajectory (stochastic gradient descent)
- Converge only asymptotically toward the minimum error
- Can approximate gradient descent arbitrarily closely if made small enough

Summary

- Perceptron training rule guaranteed to succeed if
- Training examples are linearly separable
- Sufficiently small learning rate
- Linear unit training rule uses gradient descent or LMS guaranteed to converge to hypothesis with minimum squared error
- Given sufficiently small learning rate
- Even when training data contains noise
- Even when training data not separable by H

Inner-product scalar

- Perceptron
- Perceptron learning rule
- XOR problem
- linear separable patterns
- Gradient descent
- Stochastic Approximation to gradient descent
- LMS Adaline

Download Presentation

Connecting to Server..