1 / 42

Improved BP algorithms ( first order gradient method)

Improved BP algorithms ( first order gradient method). BP with momentum Delta- bar- delta Decoupled momentum RProp Adaptive BP Trinary BP BP with adaptive gain Extended BP. BP with momentum (BPM). The basic improvement to BP (Rumelhart 1986)

nona
Download Presentation

Improved BP algorithms ( first order gradient method)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improved BP algorithms(first ordergradient method) • BP with momentum • Delta- bar- delta • Decoupled momentum • RProp • Adaptive BP • Trinary BP • BP with adaptive gain • Extended BP

  2. BP with momentum (BPM) • The basic improvement to BP (Rumelhart 1986) • Momentum factor alpha selected between zero and one • Adding momentum improves the convergence speed and helps network from being trapped in a local minimum.

  3. Modification form: • Proposed by nagata 1990 • Beta is a constant value decided by user • Nagata claimed that beta term reduce the possibility of the network being trapped in the local minimum • This seems beta repeating alpha rule again!!! But not clear

  4. Delta-bar-delta (DBD) • Use adaptive learning rate to speed up the convergence • The adaptive rate adopted base on local optimization • Use gradient descent for the search direction, and use individual step sizes for each weight

  5. RProp • Jervis and Fitzgerald (1993) • and limit the size of the step

  6. Second order gradient methods • Newton • Gauss-Newton • Levenberg-Marquardt • Quickprop • Conjugate gradient descent • Broyde –Fletcher-Goldfab-Shanno

  7. Performance Surfaces • Taylor Series Expansion

  8. Example

  9. Plot of Approximations

  10. Vector Case

  11. Matrix Form

  12. Performance Optimization • Steepest Descent • Basic Optimization Algorithm

  13. Examples

  14. Minimizing Along a Line

  15. Example

  16. Plot

  17. Newton’s Method

  18. Example

  19. Plot

  20. Non-Quadratic Example

  21. Different Initial Conditions

  22. DIFFICULT • Inverse a singular matrix!!! • Complexity

  23. Newton’s Method

  24. Matrix Form

  25. Hessian

  26. Gauss-Newton Method

  27. Levenberg-Marquardt

  28. Adjustment of mk

  29. Application to Multilayer Network

  30. Jacobian Matrix

  31. Computing the Jacobian

  32. Marquardt Sensitivity

  33. Computing the Sensitivities

  34. LMBP • Present all inputs to the network and compute the corresponding network outputs and the errors. Compute the sum of squared errors over all inputs. • Compute the Jacobian matrix. Calculate the sensitivities with the backpropagation algorithm, after initializing. Augment the individual matrices into the Marquardt sensitivities. Compute the elements of the Jacobian matrix. • Solve to obtain the change in the weights. • Recompute the sum of squared errors with the new weights. If this new sum of squares is smaller than that computed in step 1, then divide mk by u, update the weights and go back to step 1. If the sum of squares is not reduced, then multiply mk by u and go back to step 3.

  35. Example LMBP Step

  36. LMBP Trajectory

  37. Conjugate Gradient

  38. LRLS method • Recursive least square method based method, does not need gradient descent.

  39. LRLS method cont.

  40. LRLS method cont.

  41. Example for compare methods

  42. IGLS (Integration of the Gradient and Least Square) • The LRLS algorithm has faster convergence speed than BPM algorithm. However, it is computationally more complex and is not ideal for very large network size. The learning strategy describe here combines ideas from both the algorithm to achieve the objectives of maintaining complexity for large network size and reasonably fast convergence. • The out put layer weights update using BPM method • The hidden layer weights update using BPM method

More Related