Stochastic Optimization and Simulated Annealing

Stochastic Optimizationand Simulated Annealing Psychology 85-419/719 January 25, 2001

In Previous Lecture... • Discussed constraint satisfaction networks, having: • Units, weights, and a “goodness” function • Updating states involves computing input from other units • Guaranteed to locally increase goodness • Not guaranteed to globally increase goodness

True Optima Local Optima Goodness Activation State The General Problem: Local Optima

How To Solve the Problemof Local Optima? • Exhaustive search? • Nah. Takes too long. n units have 2 to the nth power possible states (if binary) • Random re-starts? • Seems wasteful. • How about something that generally goes in the right direction, with some randomness?

Sometimes It Isn’t Best ToAlways Go Straight TowardsThe Goal • Rubik’s Cube: Undo some moves in order to make progress • Baseball: sacrifice fly • Navigation: move away from goal, to get around obstacles

Activation State Randomness Can Help Us Escape Bad Solutions

So, How Random Do WeWant to Be? • We can take a cue from physical systems • In metallurgy, metals can reach a very strong (stable) state by: • Melting it; scrambles molecular structure • Gradually cooling it • Resulting molecular structure very stable • New terminology: reduce energy (which is kind of like the negative of goodness)

The input to the unit, net The temperature, T Simulated Annealing Odds that a unit is on is a function of:

Picking it Apart... • As net increases, probability that output is 1 increases • e is raised to the negative of net/T; so as net gets big, e to the negative of net/T goes to zero. So probability goes to 1/1=1.

The Temperature Term • When T is big, the exponent for e goes to zero. • e (or anything) to the zero power is 1 • So, odds output is 1 goes to 1/(1+1)=0.5

The Temperature Term (2) • When T gets small, exponent gets big. • Effect of net becomes amplified.

Low Temp Med Temp High Temp Different Temperatures... 1 .5 Probability Output is 1 0 Net Input

T 0 50 100 Ok, So At What RateDo We Reduce Temperature? In general, must decrease it very slowly to guarantee convergence to global optimum In practice, we can get away with a more aggressive annealing schedule..

Putting it Together... • We can represent facts, etc. as units • Knowledge about these facts encoded as weights • Network processing fills in gaps, makes inferences, forms interpretations • Stable Attractors form; the weights and input sculpt these attractors. • Stability (and goodness) enhanced with randomness in updating process.

Stable Attractors Can BeThought Of As Memories • How many stable patterns can be remembered by a network with N units? • There are 2 to the N possible patterns… • … but only about 0.15*N will be stable • To remember 100 things, need 100/0.15=666 units! • (then again, the brain has about 10 to the 12th power neurons…)

Human Performance, When Damaged (some examples) • Category coordinate errors • Naming a CAT as a DOG • Superordinate errors • Naming a CAT as an ANIMAL • Visual errors (deep dyslexics) • Naming SYMPATHY as SYMPHONY • or, naming SYMPATHY as ORCHESTRA

CAT CAT COT COT The Attractors We’ve TalkedAbout Can Be UsefulIn Understanding This “CAT” “CAT” Normal Performance A Visual Error (see Plaut Hinton, Shallice)

Properties of Human Memory • Details tend to go first, more general things next. Not all-or-nothing forgetting. • Things tend to be forgotten, based on • Salience • Recency • Complexity • Age of acquisition?

Do These Networks Have These Properties? • Sort of. • Graceful degradation. Features vanish as a function of strength of input to them. • Complexity: more complex / arbitrary patterns can be more difficult to retain • Salience, recency, age of acquisition? • Depends on learning rule. Stay tuned

Next Time:Psychological Implications:The IAC Model of Word Perception • Optional reading: McClelland and Rumelhart ‘81 (handout) • Rest of this class: Lab session. Help installing software, help with homework.

Stochastic Optimization and Simulated Annealing

Stochastic Optimization and Simulated Annealing

Presentation Transcript

Simulated Annealing

Simulated Annealing

Simulated Annealing

MonteCarlo Optimization Simulated Annealing

Simulated Annealing

SIMULATED ANNEALING

Simulated annealing

Simulated Annealing

Simulated Annealing

Simulated Annealing

Simulated Annealing

Simulated Annealing

Stochastic Approximation and Simulated Annealing

SIMULATED ANNEALING

MonteCarlo Optimization (Simulated Annealing)

Simulated Annealing

Simulated annealing for convex optimization

Simulated annealing for convex optimization

Simulated annealing for convex optimization

Simulated Annealing

Simulated Annealing