CS B553: Algorithms for Optimization and Learning

CS B553: Algorithms for Optimization and Learning Gradient descent

Key Concepts • Gradient descent • Line search • Convergence rates depend on scaling • Variants: discrete analogues, coordinate descent • Random restarts

Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase

Gradient descent: iteratively move in direction

Line search: pick step size to lead to decrease in function value

f(x-af(x)) a* a Line search: pick step size to lead to decrease in function value (Use your favorite univariate optimization method)

Gradient Descent Pseudocode • Input: f, starting value x1, termination tolerances • For t=1,2,…,maxIters: • Compute the search direction dt = -f(xt) • If ||dt||< εg then: return “Converged to critical point”, output xt • Find t so that f(xt+tdt) < f(xt) using line search • If ||tdt||< εx then: return “Converged in x”, output xt • Let xt+1 = xt+tdt • Return “Max number of iterations reached”, output xmaxIters

Related Methods • Steepest descent (discrete) • Coordinate descent

Many local minima: good initialization, or random restarts

CS B553: Algorithms for Optimization and Learning