1 / 28

# simulated annealing for convex optimization - PowerPoint PPT Presentation

Simulated annealing for convex optimization. Adam  . Kalai: TTI-Chicago Santosh Vempala: MIT. Bar Ilan University 2004. 100-million dollar endowment (thanks, Toyoda!) 12 tenure -track slots, 18 visitors On University of Chicago campus Optional teaching Advising graduate students.

Related searches for simulated annealing for convex optimization

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'simulated annealing for convex optimization' - JasminFlorian

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Simulated annealing for convex optimization

Santosh Vempala: MIT

Bar Ilan University

2004

• 100-million dollar endowment (thanks, Toyoda!)

• 12 tenure-track slots, 18 visitors

• On University of Chicago campus

• Optional teaching

Simulated annealing gives the best known run-time guarantees for this problem.

It is optimal among a class of random search techniques.

• Simulated annealing

• A method for blind search:

• f:X!, minx2X f(x)

• Neighbor structure N(x) µ X

• Useful in practice

• Difficult to analyze

• A generalization of linear programming

• Minimize a linear function over a convex set S ½ n

• Example: min 2x1+5x2-11x3with x12+5x22+3x32· 1

• Set S specified by membership oracle M: n! {0,1}

• M(x) = 1 \$ x 2 S

• Difficult, cannot use most linear programming techniques [GLS81,BV02]

In high dimensions

Simulated Annealing [KGV83]

Phase 1: Hot (Random)

Phase 2: Warm (Bias down)

Phase 3: Cold (Descend)

Phase 1: Hot (Random)

Phase 2: Warm (Bias down)

Phase 3: Cold (Descent)

• f:X!, minx2X f(x)

• Proceed in phases i=0,1,2,…,m

• Temperature Ti = T0(1-)i

• In phase i, do a random walk with stationary distributioni:i(x) / e-f(x)/Ti

• i=0: near uniform ! i=m: near optima

Geometric temperature schedule

Boltzmann distribution

Metropolis filter for stationary dist :

From x, pick random neighbor y.

If (y)>(x), move to y.

If (y)·(x) move to y with prob. (y)/(x)

• Great blind search technique

• Works well in practice

• Little theory

• Exponential time

• Planted graph bisection [JS93]

• Fractal functions [S91]

minimize f(x) = c ¢ x = height

x 2 S = hill

Find the bottom of the hill

using few pokes (membership queries)

Convex and

linear slope

minimize f(x) = c ¢ x = height

x 2 S ½n = hill

Find the bottom of the hill

using few pokes (membership queries)

• Ellipsoid method: O*(n10) queries

• Random walks [BV02] O*(n5) queries

Convex and

linear slope

n=# dimensions

Metropolis filter for stationary dist:

From x, pick random neighbor y.

If (y)>(x), move to y.

If (y)·(x), move to y

with prob. (y)/(x)

• To sample with stationary dist.

• Pick a random direction through the point

• C = S Å line in direction

• Take a random point from|C

C

S

• Start from a point x, random from dist.

• After O*(n3) steps, you have a new random point, “almost independent” from x [LV03]

• Difficult analysis

C

S

• Each phase, volume decreases by¼ 2/3

• In n dimensions, O(n) phases to halve distance to opt.

• minx 2 S c ¢ x

• Use distributions:

• i(x) / e-c¢x/Ti

• .

• After O( ) phases, halve distance to opt.

• That’s compared to O(n) phases [BV02]

Boltzmann distribution

Geometric temperature schedule

• Assumptions:

• Sequence of distributions1,2,…

• Each density diis log-concave:

• Consecutive densities di, di+1overlap:

• Requires at least*( ) phases

• Simulated Annealing does it in O*( ) phases

• mean mi = Ei[c ¢ x]

• variancei2 = Ei[(c ¢ x – mi)2]

• overlap

• lemma: mi – mi+1· (i+i+1)ln(2P)

• follows from log-concavity ofi

• log-concave ! P(t std dev’s from mean) < e-t

• In worst case, e.g. cone, small std dev

• i· (mi - min c ¢ x)/

• minx 2 S x0

• S = { x2n | -x0· x1,x2,…,xn-1· x0 · 10}

• Uniform dist. on S|x0 < 

• mean ¼ – /n

• std dev ¼/n

• Boltzmann dist. e- x/

• mean ¼ n

• std dev ¼

linear program

• Fix convex set S and direction c.

• Fix mean m = E[c ¢ x]

• d(x)=f(c¢x), log-concave

• Conjecture:The log-concave distributionover S with largest variancei2 = Ei[(c ¢ x – mi)2] is a Boltzmann dist. (exponential dist.)

• Dist i/ e-c¢x/Ti

• Lemma: Ei[c ¢ x] · (minx 2 S c ¢ x ) + n|c|Ti

• Not sufficient that distributions overlap

• An expected warm start:

Shape may change

Estimate covariance with O*(n) samples

Similar issues with hit and run

• Shape estimate is covariance matrix (normalized)

• OK as long as relative estimates are accurate within a constant factor

• In most cases shape changes little

• No need for re-estimation

• Cube, ball, cone, …

• In worst case, shape may change every phase

• Increase run-time by factor of n

• Differs from simulated annealing

• Annealing: O*(n0.5) phases

• State-of-the-art walks [LV03]

• Worst case: O*(n) samples per phase(for shape)

• O*(n3) steps per sample

• Total: O*(n4.5) (compare to O*(n10) [GLS81], O*(n5) [BV02])

• Random search is useful for convex optimization [BV02]

• Simulated annealing can be analyzed for convex optimization [KV04]

• It’s opt among random search procedures

• Annoying shape re-estimation

• Difficult analyses of random walks [LV02]

• Weird: no local minima!

• Analyzed for other problems?

• Start near single point v

• Idea

• Sample from density / e-|x-v|/Ti in phase i

• Temperature increases

• Move from single point to uniform dist

• Estimate volume increase each time

• Able to do in O*(n4) rather than O(n4.5)

• Similar algorithm analysis