Simulated annealing for convex optimization

1 / 28

# simulated annealing for convex optimization - PowerPoint PPT Presentation

Simulated annealing for convex optimization. Adam  . Kalai: TTI-Chicago Santosh Vempala: MIT. Bar Ilan University 2004. 100-million dollar endowment (thanks, Toyoda!) 12 tenure -track slots, 18 visitors On University of Chicago campus Optional teaching Advising graduate students.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'simulated annealing for convex optimization' - JasminFlorian

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Simulated annealing for convex optimization

Santosh Vempala: MIT

Bar Ilan University

2004

100-million dollar endowment (thanks, Toyoda!)
• 12 tenure-track slots, 18 visitors
• On University of Chicago campus
• Optional teaching
Outline

Simulated annealing gives the best known run-time guarantees for this problem.

It is optimal among a class of random search techniques.

• Simulated annealing
• A method for blind search:
• f:X!, minx2X f(x)
• Neighbor structure N(x) µ X
• Useful in practice
• Difficult to analyze
• A generalization of linear programming
• Minimize a linear function over a convex set S ½ n
• Example: min 2x1+5x2-11x3with x12+5x22+3x32· 1
• Set S specified by membership oracle M: n! {0,1}
• M(x) = 1 \$ x 2 S
• Difficult, cannot use most linear programming techniques [GLS81,BV02]

In high dimensions

Simulated Annealing [KGV83]

Phase 1: Hot (Random)

Phase 2: Warm (Bias down)

Phase 3: Cold (Descend)

Phase 1: Hot (Random)

Phase 2: Warm (Bias down)

Phase 3: Cold (Descent)

Simulated Annealing
• f:X!, minx2X f(x)
• Proceed in phases i=0,1,2,…,m
• Temperature Ti = T0(1-)i
• In phase i, do a random walk with stationary distributioni:i(x) / e-f(x)/Ti
• i=0: near uniform ! i=m: near optima

Geometric temperature schedule

Boltzmann distribution

Metropolis filter for stationary dist :

From x, pick random neighbor y.

If (y)>(x), move to y.

If (y)·(x) move to y with prob. (y)/(x)

Simulated Annealing
• Great blind search technique
• Works well in practice
• Little theory
• Exponential time
• Planted graph bisection [JS93]
• Fractal functions [S91]
Convex optimization

minimize f(x) = c ¢ x = height

x 2 S = hill

Find the bottom of the hill

using few pokes (membership queries)

Convex and

linear slope

Convex optimization

minimize f(x) = c ¢ x = height

x 2 S ½n = hill

Find the bottom of the hill

using few pokes (membership queries)

• Ellipsoid method: O*(n10) queries
• Random walks [BV02] O*(n5) queries

Convex and

linear slope

n=# dimensions

Walking in a convex set

Metropolis filter for stationary dist:

From x, pick random neighbor y.

If (y)>(x), move to y.

If (y)·(x), move to y

with prob. (y)/(x)

Hit and run
• To sample with stationary dist.
• Pick a random direction through the point
• C = S Å line in direction
• Take a random point from|C

C

S

Hit and run
• Start from a point x, random from dist.
• After O*(n3) steps, you have a new random point, “almost independent” from x [LV03]
• Difficult analysis

C

S

Random walks for optimization [BV02]
• Each phase, volume decreases by¼ 2/3
• In n dimensions, O(n) phases to halve distance to opt.
Annealing is slightly faster
• minx 2 S c ¢ x
• Use distributions:
• i(x) / e-c¢x/Ti
• .
• After O( ) phases, halve distance to opt.
• That’s compared to O(n) phases [BV02]

Boltzmann distribution

Geometric temperature schedule

Annealing Optimality
• Assumptions:
• Sequence of distributions1,2,…
• Each density diis log-concave:
• Consecutive densities di, di+1overlap:
• Requires at least*( ) phases
• Simulated Annealing does it in O*( ) phases
Lower bound idea
• mean mi = Ei[c ¢ x]
• variancei2 = Ei[(c ¢ x – mi)2]
• overlap
• lemma: mi – mi+1· (i+i+1)ln(2P)
• follows from log-concavity ofi
• log-concave ! P(t std dev’s from mean) < e-t
• In worst case, e.g. cone, small std dev
• i· (mi - min c ¢ x)/
Worst case: a cone
• minx 2 S x0
• S = { x2n | -x0· x1,x2,…,xn-1· x0 · 10}
• Uniform dist. on S|x0 < 
• mean ¼ – /n
• std dev ¼/n
• Boltzmann dist. e- x/
• mean ¼ n
• std dev ¼

linear program

Any convex shape
• Fix convex set S and direction c.
• Fix mean m = E[c ¢ x]
• d(x)=f(c¢x), log-concave
• Conjecture:The log-concave distributionover S with largest variancei2 = Ei[(c ¢ x – mi)2] is a Boltzmann dist. (exponential dist.)
Upper bound basics
• Dist i/ e-c¢x/Ti
• Lemma: Ei[c ¢ x] · (minx 2 S c ¢ x ) + n|c|Ti
Upper bound difficulties
• Not sufficient that distributions overlap
• An expected warm start:

Shape may change

Shape estimation

Estimate covariance with O*(n) samples

Similar issues with hit and run

Shape re-estimation
• Shape estimate is covariance matrix (normalized)
• OK as long as relative estimates are accurate within a constant factor
• In most cases shape changes little
• No need for re-estimation
• Cube, ball, cone, …
• In worst case, shape may change every phase
• Increase run-time by factor of n
• Differs from simulated annealing
Run-time guarantees
• Annealing: O*(n0.5) phases
• State-of-the-art walks [LV03]
• Worst case: O*(n) samples per phase(for shape)
• O*(n3) steps per sample
• Total: O*(n4.5) (compare to O*(n10) [GLS81], O*(n5) [BV02])
Conclusions
• Random search is useful for convex optimization [BV02]
• Simulated annealing can be analyzed for convex optimization [KV04]
• It’s opt among random search procedures
• Annoying shape re-estimation
• Difficult analyses of random walks [LV02]
• Weird: no local minima!
• Analyzed for other problems?
Reverse annealing [LV03]
• Start near single point v
• Idea
• Sample from density / e-|x-v|/Ti in phase i
• Temperature increases
• Move from single point to uniform dist
• Estimate volume increase each time
• Able to do in O*(n4) rather than O(n4.5)
• Similar algorithm analysis