1 / 39

CHAPTER 11

CHAPTER 11. Back-Propagation. Objectives. A generalization of the LMS algorithm, called backpropagation , can be used to train multilayer networks . Backpropagation is an approximate steepest descent algorithm , in which the performance index is mean square error .

ralph
Download Presentation

CHAPTER 11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHAPTER 11 Back-Propagation

  2. Objectives • A generalization of the LMS algorithm, called backpropagation, can be used to train multilayer networks. • Backpropagation is an approximate steepest descent algorithm, in which the performance index is mean square error. • In order to calculate the derivatives, we need to use the chain rule of calculus.

  3. Motivation • The perceptron learning and the LMS algorithm were designed to train single-layer perceptron-like networks. • They are only able to solve linearly separable classification problems. • Parallel Distributed Processing • The multilayer perceptron, trained by the backpropagation algorithm, is currently the most widely used neural network.

  4. Three-Layer Network Number of neurons in each layer:

  5. Pattern Classification: XOR gate • The limitations of the single-layer perceptron (Minsky & Papert, 1969)

  6. Two-Layer XOR Network • Two-layer, 2-2-1 network AND Individual Decisions

  7. Solved Problem P11.1 • Design a multilayer network to distinguish these categories. Class I Class II There is no hyperplane that can separate these two categories.

  8. Solution of Problem P11.1 OR AND

  9. Function Approximation • Two-layer, 1-2-1 network

  10. Function Approximation • The centers of the steps occur where the net input to a neuron in the first layer is zero. • The steepness of each step can be adjusted by changing the network weights.

  11. Effect of Parameter Changes

  12. Effect of Parameter Changes

  13. Effect of Parameter Changes

  14. Effect of Parameter Changes

  15. Function Approximation • Two-layer networks, with sigmoid transfer functions in the hidden layer and linear transfer functions in the output layer, can approximate virtually any functionof interest to any degree accuracy, provided sufficiently many hidden units are available.

  16. Backpropagation Algorithm • For multilayer networks the outputs of one layer becomes the input to the following layer.

  17. Performance Index • Training Set: • Mean Square Error: • Vector Case: • Approximate Mean Square Error: • Approximate Steepest Descent Algorithm

  18. Chain Rule • If f(n) = en and n = 2w, so that f(n(w)) = e2w. • Approximate mean square error:

  19. Sensitivity & Gradient • The net input to the ith neurons of layer m: • The sensitivity of to changes in the ith element of the net input at layer m: • Gradient:

  20. F - - - - - - - - - m  n 1  F - - - - - - - - -  F m m s  - - - - - - - - - - =  n m 2 n    F - - - - - - - - - - - m  n m S Steepest Descent Algorithm • The steepest descent algorithm for the approximate mean square error: • Matrix form:

  21. BP the Sensitivity • Backpropagation: a recurrence relationship in which the sensitivity at layer m is computed from the sensitivity at layer m+1. • Jacobian matrix:

  22. Matrix Repression • The i,j element of Jacobian matrix

  23. Recurrence Relation • The recurrence relation for the sensitivity • The sensitivities are propagated backward through the network from the last layer to the first layer.

  24. Backpropagation Algorithm • At the final layer:

  25. Summary • The first step is to propagate the inputforward through the network: • The second step is to propagate the sensitivitiesbackward through the network: • Output layer: • Hidden layer: • The final step is to update the weights and biases:

  26. BP Neural Network

  27. Ex: Function Approximation t  p e + 1-2-1 Network

  28. Network Architecture p a 1-2-1 Network

  29. Initial Values Initial Network Response:

  30. Forward Propagation Initial input: Output of the 1st layer: Output of the 2nd layer: error:

  31. Transfer Func. Derivatives

  32. Backpropagation • The second layer sensitivity: • The first layer sensitivity:

  33. Weight Update • Learning rate

  34. Choice of Network Structure • Multilayer networks can be used to approximate almost any function, if we have enough neurons in the hidden layers. • We cannot say, in general, how many layers or how many neurons are necessary for adequate performance.

  35. Illustrated Example 1 1-3-1 Network

  36. Illustrated Example 2 1-2-1 1-3-1 1-4-1 1-5-1

  37. 5 1 5 3 3 4 2 4 2 0 0 1 Convergence Convergence to Global Min. Convergence to Local Min. The numbers to each curve indicate the sequence of iterations.

  38. Generalization • In most cases the multilayer network is trained with a finite number of examples of proper network behavior: • This training set is normally representative of a much larger class of possible input/output pairs. • Can the network successfully generalize what it has learned to the total population?

  39. 3 2 1 0 -1 -2 -1 0 1 2 Generalization Example 1-9-1 1-2-1 Generalize well Not generalize well For a network to be able to generalize, it should have fewer parameters than there are data points in the training set.

More Related