1 / 21

CS B553: Algorithms for Optimization and Learning

CS B553: Algorithms for Optimization and Learning. Gradient descent. Key Concepts. Gradient descent Line search Convergence rates depend on scaling Variants: discrete analogues, coordinate descent Random restarts.

Download Presentation

CS B553: Algorithms for Optimization and Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS B553: Algorithms for Optimization and Learning Gradient descent

  2. Key Concepts • Gradient descent • Line search • Convergence rates depend on scaling • Variants: discrete analogues, coordinate descent • Random restarts

  3. Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase

  4. Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase

  5. Gradient descent: iteratively move in direction

  6. Gradient descent: iteratively move in direction

  7. Gradient descent: iteratively move in direction

  8. Gradient descent: iteratively move in direction

  9. Gradient descent: iteratively move in direction

  10. Gradient descent: iteratively move in direction

  11. Gradient descent: iteratively move in direction

  12. Line search: pick step size to lead to decrease in function value

  13. f(x-af(x)) a* a Line search: pick step size to lead to decrease in function value (Use your favorite univariate optimization method)

  14. Gradient Descent Pseudocode • Input: f, starting value x1, termination tolerances • For t=1,2,…,maxIters: • Compute the search direction dt = -f(xt) • If ||dt||< εg then: return “Converged to critical point”, output xt • Find t so that f(xt+tdt) < f(xt) using line search • If ||tdt||< εx then: return “Converged in x”, output xt • Let xt+1 = xt+tdt • Return “Max number of iterations reached”, output xmaxIters

  15. Related Methods • Steepest descent (discrete) • Coordinate descent

  16. Many local minima: good initialization, or random restarts

More Related