1 / 15

Neural Networks: Hessians

Neural Networks: Hessians. - Shubham Shukla. Hessians? Like I Care!. Hessians! Any use?. Uses of Hessians: Determination of the kinetic constants of decomposition reactions. Click here for more on this. Edge detection in DIP – involves abrupt change of Gray levels.

brone
Download Presentation

Neural Networks: Hessians

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Networks: Hessians -ShubhamShukla

  2. Hessians? Like I Care!

  3. Hessians! Any use? Uses of Hessians: Determination of the kinetic constants of decomposition reactions. Click here for more on this. Edge detection in DIP – involves abrupt change of Gray levels. Object Recognition – Robot Vision.

  4. Hessians – Machine Learning? Non Linear optimization algos for training use 2nd order derivatives of Error Function. Good for retraining a FFNN with slight change in training data. Laplace approximations for Bayesian Neural Nets. In Network ‘Pruning’ algorithms.

  5. Why approximate Hessians? No. Of parameter : W (weights and bias) For each pattern: O (W2). Approximations provide easy way to reduce complexity to O (W). Decently fair enough estimate for H(x) for a particular domain.

  6. Diagonal Approximation Some applications of Hessians require inv(H) Good Approximation: Replace off-diagonal elements to zero. RHS can be recursively found as:

  7. Diagonal Approximation(2) Neglecting diagonal elements we get: This is of order O(W) WRT the original order of Hessian which is O(W2). Problem: Typically Hessians are strongly non-diagonal.

  8. Outer Product Approximation Good for regression problems. Uses sum of square error functions. Hessian Matrix:

  9. Outer Product Approximation(2) Eliminate 2nd order differential term on RHS. For trained System: yn = t n So, 2nd derivative vanishes. In general, (From 1.5.5): (yn)opt = avg[E(t|x)] So, 2nd derivative is eliminated either ways. Levenburg – Marquardt approximation:

  10. Inverse Hessians Outer Approximation: Sequential approach to building up Hessian: Woodbury Identity:

  11. Inverse Hessians Put: HL = M and v = b: Hence, sequential procedure continues till (L+1) = N. Initialization with: H0 = αI.

  12. When Perfection Matters!  Exact evaluation of Hessian by extending Back-prop approach to evaluate first order derivatives. Consider network with 2 layers of weight. We define:

  13. When Perfection Matters!  (2) Both weight in second layer: Both weights in first layer:

  14. When perfection Matters!  (3) One weight in each layer:

  15. Over to Mamatha…

More Related