Simulated annealing for convex optimization

Simulated annealing for convex optimization Adam . Kalai: TTI-Chicago Santosh Vempala: MIT Bar Ilan University 2004

100-million dollar endowment (thanks, Toyoda!) • 12 tenure-track slots, 18 visitors • On University of Chicago campus • Optional teaching • Advising graduate students

Outline Simulated annealing gives the best known run-time guarantees for this problem. It is optimal among a class of random search techniques. • Simulated annealing • A method for blind search: • f:X!, minx2X f(x) • Neighbor structure N(x) µ X • Useful in practice • Difficult to analyze • A generalization of linear programming • Minimize a linear function over a convex set S ½ n • Example: min 2x1+5x2-11x3with x12+5x22+3x32· 1 • Set S specified by membership oracle M: n! {0,1} • M(x) = 1 $ x 2 S • Difficult, cannot use most linear programming techniques [GLS81,BV02] In high dimensions

Steepest descent

Random Search

Simulated Annealing [KGV83] Phase 1: Hot (Random) Phase 2: Warm (Bias down) Phase 3: Cold (Descend) Phase 1: Hot (Random) Phase 2: Warm (Bias down) Phase 3: Cold (Descent)

Simulated Annealing • f:X!, minx2X f(x) • Proceed in phases i=0,1,2,…,m • Temperature Ti = T0(1-)i • In phase i, do a random walk with stationary distributioni:i(x) / e-f(x)/Ti • i=0: near uniform ! i=m: near optima Geometric temperature schedule Boltzmann distribution Metropolis filter for stationary dist : From x, pick random neighbor y. If (y)>(x), move to y. If (y)·(x) move to y with prob. (y)/(x)

Simulated Annealing • Great blind search technique • Works well in practice • Little theory • Exponential time • Planted graph bisection [JS93] • Fractal functions [S91]

Convex optimization minimize f(x) = c ¢ x = height x 2 S = hill Find the bottom of the hill using few pokes (membership queries) Convex and linear slope

Convex optimization minimize f(x) = c ¢ x = height x 2 S ½n = hill Find the bottom of the hill using few pokes (membership queries) • Ellipsoid method: O*(n10) queries • Random walks [BV02] O*(n5) queries Convex and linear slope n=# dimensions

Walking in a convex set Metropolis filter for stationary dist: From x, pick random neighbor y. If (y)>(x), move to y. If (y)·(x), move to y with prob. (y)/(x)

Walking in a high-dimensional convex set

Hit and run • To sample with stationary dist. • Pick a random direction through the point • C = S Å line in direction • Take a random point from|C C S

Hit and run • Start from a point x, random from dist. • After O*(n3) steps, you have a new random point, “almost independent” from x [LV03] • Difficult analysis C S

Random walks for optimization [BV02] • Each phase, volume decreases by¼ 2/3 • In n dimensions, O(n) phases to halve distance to opt.

Annealing is slightly faster • minx 2 S c ¢ x • Use distributions: • i(x) / e-c¢x/Ti • . • After O( ) phases, halve distance to opt. • That’s compared to O(n) phases [BV02] Boltzmann distribution Geometric temperature schedule

Annealing Optimality • Assumptions: • Sequence of distributions1,2,… • Each density diis log-concave: • Consecutive densities di, di+1overlap: • Requires at least*( ) phases • Simulated Annealing does it in O*( ) phases

Lower bound idea • mean mi = Ei[c ¢ x] • variancei2 = Ei[(c ¢ x – mi)2] • overlap • lemma: mi – mi+1· (i+i+1)ln(2P) • follows from log-concavity ofi • log-concave ! P(t std dev’s from mean) < e-t • In worst case, e.g. cone, small std dev • i· (mi - min c ¢ x)/

Worst case: a cone • minx 2 S x0 • S = { x2n | -x0· x1,x2,…,xn-1· x0 · 10} • Uniform dist. on S|x0 <  • mean ¼ – /n • std dev ¼/n • Boltzmann dist. e- x/ • mean ¼ n • std dev ¼ linear program

Any convex shape • Fix convex set S and direction c. • Fix mean m = E[c ¢ x] • d(x)=f(c¢x), log-concave • Conjecture:The log-concave distributionover S with largest variancei2 = Ei[(c ¢ x – mi)2] is a Boltzmann dist. (exponential dist.)

Upper bound basics • Dist i/ e-c¢x/Ti • Lemma: Ei[c ¢ x] · (minx 2 S c ¢ x ) + n|c|Ti

Upper bound difficulties • Not sufficient that distributions overlap • An expected warm start: Shape may change

Shape estimation Estimate covariance with O*(n) samples Similar issues with hit and run

Shape re-estimation • Shape estimate is covariance matrix (normalized) • OK as long as relative estimates are accurate within a constant factor • In most cases shape changes little • No need for re-estimation • Cube, ball, cone, … • In worst case, shape may change every phase • Increase run-time by factor of n • Differs from simulated annealing

Run-time guarantees • Annealing: O*(n0.5) phases • State-of-the-art walks [LV03] • Worst case: O*(n) samples per phase(for shape) • O*(n3) steps per sample • Total: O*(n4.5) (compare to O*(n10) [GLS81], O*(n5) [BV02])

Conclusions • Random search is useful for convex optimization [BV02] • Simulated annealing can be analyzed for convex optimization [KV04] • It’s opt among random search procedures • Annoying shape re-estimation • Difficult analyses of random walks [LV02] • Weird: no local minima! • Analyzed for other problems?

Reverse annealing [LV03] • Start near single point v • Idea • Sample from density / e-|x-v|/Ti in phase i • Temperature increases • Move from single point to uniform dist • Estimate volume increase each time • Able to do in O*(n4) rather than O(n4.5) • Similar algorithm analysis

Simulated annealing for convex optimization