CHAPTER 7

CHAPTER 7 Supervised Hebbian Learning

Objectives • The Hebb rule, proposed by Donald Hebb in 1949, was one of the first neural network learning laws. • A possible mechanism for synaptic modification in the brain. • Use the linear algebra concepts to explain why Hebbian learning works. • The Hebb rule can be used to train neural networks for pattern recognition.

Hebb’s Postulate • Hebbian learning (The Organization of Behavior) When anaxon of cell A is near enough to excitea cell B and repeatedly or persistently takes part in firing it; some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased. 當細胞A的軸突到細胞B的距離進到足夠刺激它，且反覆或持續地刺激B，則在這兩個細胞或其中一個將會發生某種成長過程或代謝反應，以增加A對B的刺激效果。

SR W n a p S1 S1 R1 R S a = Wp  The linear associator is an example of a type of neural network called an associator memory.  If p = pq, then a = tq. q = 1,2,…,Q.  The task of an associator is to learn Q pairs of If p = pq+, then a = tq+. prototype input/output vectors: {p1,t1}, {p2,t2},…, {pQ,tQ}. Linear Associator

  If two neurons on either side of a synapse are activated simultaneously, the strength of the synapse will increase. The connection (synapse) between inputpj and outputai is the weightwij. Not only do we increase the weight when pjand ai are positive, but we also increase the weight when they are both negative. Hebb Learning Rule Unsupervised learning rule Supervised learning rule

 Assume that the weight matrix is initialized to zero and each of the Q input/output pairs are applied once to the supervised Hebb rule. (Batch operation) Supervised Hebb Rule

   Assume that the pq vectors are orthonormal (orthogonal and unit length), then If pq is input to the network, then the network output can be computed If the input prototype vectors are orthonormal, the Hebb rule will produce the correct output for each input. Performance Analysis

  Assume that each pq vector is unit length, but they are not orthogonal. Then The magnitude of the error will depend on the amount of correlation between the prototype input patterns. Performance Analysis error

Orthonormal Case Success!!

Not Orthogonal Case The outputs are close, but do not quite match the target outputs.

Solved Problem P7.2 i. Orthogonal, notorthonormal, ii.

Solutions of Problem P7.2 iii. Hamming dist. = 1 Hamming dist. = 2

   Goal: choose the weight matrix W to minimizeF(W). When the input vectors are not orthogonal and we use the Hebb rule, then F(W) will be not be zero, and it is not clear that F(W) will be minimized. If the P matrix has an inverse, the solution is Pseudoinverse Rule  Performance index: 

 P matrix has an inverseiffP must be a square matrix. Normally the pq vectors (the column of P) will be independent, but R (the dimension of pq, no. of rows) will be larger than Q (the number of pq vectors, no. of columns). P does not exist any inverse matrix. The weight matrix W that minimizes the performance index is given by the pseudoinverse rule .  where P+ is the Moore-Penrose pseudoinverse. Pseudoinverse Rule

   The pseudoinverse of a real matrix P is the unique matrix that satisfies When R (no. of rows of P) >Q (no. of columns of P) and the columnsofP are independent, then the pseudoinverse can be computed by . Note that we do NOT need normalizetheinput vectors when using the pseudoinverse rule. Moore-Penrose Pseudoinverse

Example of Pseudoinverse Rule

p   The linear associator using the Hebb rule is a type of associative memory( tq pq ). In an autoassociative memory the desired output vector is equal to the input vector ( tq = pq ). An autoassociative memory can be used to store a set of patterns and then to recall these patterns, even when corrupted patterns are provided as input. 301 30 3030 n a W 301 301 30 Autoassociative Memory

   Recovery of 50% Occluded Patterns Recovery of Noisy Patterns Recovery of 67% Occluded Patterns Corrupted & Noisy Versions

    Many of the learning rules have some relationship to the Hebb rule. The weight matrices of Hebb rule have very large elements if there are many prototype patterns in the training set. Basic Hebb rule: Filtered learning: adding adecay term, so that the learning rule behaves like a smoothing filter, remembering the most recent inputs more clearly. Variations ofHebbian Learning

    Delta rule: replacing the desired output with the difference between the desired output and the actual output. It adjusts the weights so as to minimize the mean square error. The delta rule can update the weights after each new input pattern is presented. Basic Hebb rule: Unsupervised Hebb rule: Variations ofHebbian Learning

11 + p a Wp = 0 i. W Why is a bias required to solve this problem? The decision boundary for the perceptron network is Wp + b = 0. If these is no bias, then the boundary becomes Wp = 0 which is a line that must pass through the origin. No decision boundary that passes through the origin could separate these two vectors. b 1 p2 11 n 21 11 11 p1 2 1 Solved Problem P7.6

ii. Use the pseudoinverse rule to design a network with bias to solved this problem. Treat the bias as another weight, with an input of 1. p2 p1 Solved Problem P7.6 Wp + b = 0

   Up to now, we have represented patterns as vectors by using “1” and “–1” to represent dark and light pixels, respectively. What if we were to use “1” and “0” instead? How should the Hebb rule be changed? Bipolar {–1,1} representation: Binary {0,1} representation: Solved Problem P7.7 , where 1 is a vector of ones.

SR n = Wp + b a = hardlim(Wp + b) R1 a + S1 n S1 R b S 1 S1 Binary Associative Network

CHAPTER 7

CHAPTER 7

Presentation Transcript

Chapter 7

Chapter 7

Chapter 7

CHAPTER 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7