1 / 20

Natural Gradient Works Efficiently in Learning S Amari

Natural Gradient Works Efficiently in Learning S Amari. 11.03.18.(Fri) Computational Modeling of Intelligence Summarized by Joon Shik Kim. Abstract. The ordinary gradient of a function does not represent its steepest direction, but the natural gradient does.

luna
Download Presentation

Natural Gradient Works Efficiently in Learning S Amari

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Gradient Works Efficiently in LearningS Amari 11.03.18.(Fri) Computational Modeling of Intelligence Summarized by JoonShik Kim

  2. Abstract • The ordinary gradient of a function does not represent its steepest direction, but the natural gradient does. • The dynamical behavior of natural gradient online learning is analyzed and is proved to be Fisher efficient. • The plateau phenomenon, which appears in the backpropagation learning algorithm of multilayer perceptron, might disappear or might not be so serious when the natural gradient is used.

  3. Introduction (1/2) • The stochastic gradient method is a popular learning method in the general nonlinear optimization framework. • The parameter space is not Euclidean but has a Riemannian metric structure in many cases. • In these cases, the ordinary gradient does not give the steepest direction of target function.

  4. Introduction (2/2) • Barkai, Seung, and Sompolisky (1995) proposed an adaptive method of adjusting the learning rate. We generalize their idea and evaluate its performance based on the Riemannian metric of errors.

  5. Natural Gradient (1/5) • The squared length of a small incremental vector dw, • When the coordinate system is nonorthogonal, the squared length is given by the quadratic form,

  6. Natural Gradient (2/5) • The steepest descent direction of a function L(w) at w is defined by the vector dwhas that minimizes L(w+dw) where |dw| has a fixed length, that is, under the constant,

  7. Natural Gradient (3/5) • The steepest descent direction of L(w) in a Riemannian space is given by,

  8. Natural Gradient (4/5)

  9. Natural Gradient (5/5)

  10. Natural Gradient Learning • Risk function or average loss, • Learning is a procedure to search for the optimal w* that minimizes L(w). • Stochastic gradient descent learning

  11. Statistical Estimation of Probability Density Function (1/2) • In the case of statistical estimation, we assume a statistical model {p(z,w)}, and the problem is to obtain the probability distribution that approximates the unknown density function q(z) in the best way. • Loss function is

  12. Statistical Estimation of Probability Density Function (2/2) • The expected loss is then given by Hz is the entropy of q(z) not depending on w. • Riemannian metric is Fisher information

  13. Fisher Information as the Metric of Kullback-Leibler Divergence (1/2) • p=q(θ+h)

  14. Fisher Information as the Metric of Kullback-LeiblerDivergence (2/2) I: Fisher information

  15. Multilayer Neural Network (1/2)

  16. Multilayer Neural Network (2/2) c is a normalizing constant

  17. Natural Gradient Gives Fisher-Efficient Online Learning Algorithms (1/4) • DT = {(x1,y1),…,(xT,yT)} is T-independent input-output examples generated by the teacher network having parameter w*. • Minimizing the log loss over the training data DT is to obtain that minimizes the training error

  18. Natural Gradient Gives Fisher-Efficient Online Learning Algorithms (2/4) • The Cramér-Rao theorem states that the expected squared error of an unbiased estimator satisfies • An estimator is said to be efficient or Fisher efficient when it satisfies above equation.

  19. Natural Gradient Gives Fisher-Efficient Online Learning Algorithms (3/4) • Theorem 2. The natural gradient online estimator is Fisher efficient. • Proof

  20. Natural Gradient Gives Fisher-Efficient Online Learning Algorithms (4/4)

More Related