1 / 46

Stochastic Relaxation, Simulating Annealing, Global Minimizers

This paper discusses the use of Stochastic Relaxation Simulating Annealing to escape local minima and find global minimizers. It explores different types of relaxation methods and variable by variable relaxation. The Metropolis Algorithm is explained in detail. The paper also discusses the application of Stochastic Relaxation Simulating Annealing to the 2D Ising model and the bisectioning problem. The concept of Lowest Common Configuration is introduced to improve the global minimization process. Finally, the paper explores the challenges of heating and cooling scheduling and suggests the use of memory to optimize the process.

bishop
Download Presentation

Stochastic Relaxation, Simulating Annealing, Global Minimizers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic Relaxation,Simulating Annealing,Global Minimizers

  2. Different types of relaxation • Variable by variable relaxation – strict minimization • Changing a small subset of variables simultaneously – Window strict minimization relaxation • Stochastic relaxation – may increase the energy – should be followed by strict minimization

  3. Complex landscape of E(X)

  4. How to escape local minima? • First go uphill, then may hit a lower basin • In order to go uphill should allow increase in E(x) • Add stochasticity: allow E(x) to increase with probability which is governed by an external temperature-like parameter T The Metropolis Algorithm (Kirpartick et al. 1983) Assume xold is the current state, define xnewto be a neighboring state and delE=E(xnew)-E(xold) then IfdelE<0 replace xold by xnew else choose xnew with probability P(xnew)= and xoldwith probability P(xold)=1- P(xnew)

  5. The probability to accept an increasing energy move

  6. The Metropolis Algorithm • As T 0 and when delE>0 : P(xnew) 0 • At T=0: strict minimization • High T randomizes the configuration away from the minimum • Low T cannot escape local minima • Starting from a high T, the slower T is decreased the lower E(x) is achieved • The slow reduction in T allows the material to obtain a more arranged configuration: increase the size of its crystals and reduce their defects

  7. Fast cooling – amorphous solid

  8. Slow cooling - crystalline solid

  9. SA for the 2D IsingE=-Sijsisj , i and j are nearest neighbors + + + + Eold=-2

  10. SA for the 2D IsingE=-Sijsisj , i and j are nearest neighbors + + + + + + + Eold=-2 Enew=2

  11. SA for the 2D IsingE=-Sijsisj , i and j are nearest neighbors + + + + + + + Eold=-2 Enew=2 delE=Enew- Eold=4>0 P(Enew)=exp(-4/T)

  12. SA for the 2D IsingE=-Sijsisj , i and j are nearest neighbors + + + + + + + Eold=-2 Enew=2 delE=Enew- Eold=4>0 P(Enew)=exp(-4/T) =0.3 => T=-4/ln0.3 ~ 3.3 Reduce T by a factor a, 0<a<1: Tn+1=aTn

  13. Exc#7: SA for the 2D Ising (see Exc#1) Consider the following cases: 1. For h1= h2=0 set a stripe of width 3,6 or 12 with opposite sign 2. For h1=-0.1, h2=0.4 set -1 at h1 and +1 at h2 3. Repeat 2. with 2 squares of 8x8 plus spins with h2=0.4 located apart from each other Calculate T0 to allow 10% flips of a spin surrounded by 4 neighbors of the same sign Use faster / slower cooling scheduling a. What was the starting T0 , E in each case b. How was T0 decreased, how many sweeps were employed c. What was the final configuration, was the global minimum achievable? If not try different T0 d. Is it harder to flip a wider stripe? e. Is it harder to flip 2 squares than just one?

  14. SA for the bisectioning problem R R’ i

  15. SA for the bisectioning problem “individual” temperature R R’ i The probability of i to belong to R’ depends on Si = Sj in R’ aij / S aij P(i in R’)={ 1 delE<=0 exp[-delE/(TSi)] delE> 0

  16. SA for the bisectioning problem “individual” temperature R R’ i The probability of i to belong to R’ should increase if a bigger change along the cut line is made If delE is small enough it is expected that further moves will indeed eventually produce a lower E

  17. SA for the bisectioning problem how to choose T R R’ i • Calculate delE/Si along the cut line and sort them • Decide upon the % of changes desired • Find the appropriate T by demanding P(%)=0.5

  18. SA for the linear ordering problems: multiple choices for a variable • Try to move node i up to k moves to the right and to the left: choose between the 2k+1 possibilities • For j=-k,..,-1,1,..,k , P(j)=z min[1 , exp(-delE(j)/T(j))] • For k=0: P(0)=z minj[1 - P(j)/z] • z is calculated from the normalizationSj P(j)=1 • T(j) is calculating apriori for each j aiming at a certain acceptance rate (e.g. 60%)

  19. The Metropolis Algorithm (cont.) • May result in a very slow processing • Still, SA is considered to be a powerful global minimizer • Instead of very slow cooling schedule, repeat heating-cooling several times

  20. Heating-cooling scheduling T #relaxation sweeps

  21. The Metropolis Algorithm (cont.) • May result in a very slow processing • Still, SA is considered to be a powerful global minimizer • Instead of very slow cooling schedule, repeat heating-cooling several times and keep track of the best-so-far configuration: • The best-so-far has a non-increasing E • It is an outside observer • The best-so-far is actually the calculated minimum

  22. Heating-cooling scheduling T #relaxation sweeps Store the best-so-far

  23. The Metropolis Algorithm (cont.) • May result in a very slow processing • Still, SA is considered to be a powerful global minimizer • Instead of very slow cooling schedule, repeat heating-cooling several times and keep track of the best-so-far configuration: • The best-so-far has a non-increasing E • It is an outside observer • The best-so-far is actually the calculated minimum • Problem: heating may destroy already achieved minima in various subregions • Add “memory” of the best-so-far for those subregions

  24. Lowest Common Configuration The global minimum

  25. Lowest Common Configuration C1 The global minimum

  26. Lowest Common Configuration C1 C2 The global minimum

  27. Lowest Common Configuration C1 C2 The global minimum E(LCC(C1, C2))<= min[E(C1), E(C2)] LCC(C1, C2)

  28. Heating-cooling scheduling T #relaxation sweeps Apply LCC

  29. Heating-cooling scheduling T #relaxation sweeps best-so-far LCC (best-so-far , the new T=0)

  30. Exc#8: LCC for the bisectioning problem R R’ i Given 2 partitions, find a linear time algorithm for the construction of their LCC

  31. Exc#8*:LCC for linear ordering problems Find a (nearly) linear time algorithm (e.g. sorting is allowed) for the LCC of 2 permutations, in which subpermutations are detected and chosen into the best-so-far

  32. Multilevel Simulated Annealing • Do not increase T by much: avoid destroying the global solution inherited from the coarser levels • Reduce T quickly: typically 2-3 values of T>0 (followed by strict minimization) are sufficient • Repeat heating-cooling several times per level • Accumulate the minimal solution into the best-so-far by applying the LCC at the end of T=0 • Interpolate the best-so-far to the next level

  33. Genetic algorithm:A global minimizer

  34. Genetic algorithm • A global search technique inspired by evolutionary biology • Start from a population of individuals (randomly generated) – this is the 1st generation • The next generation follows by: 1. selection of individuals from the current generation to breed the next generation according to some fitness measure 2. crossover (recombination) of pair of (randomly chosen) parents to produce an offspring 3. mutations are applied randomly to enhance the diversity of the individuals in the generation

  35. A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm

  36. A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm Fi Choose a node from Fi The one which is mostly connected to the already placed nodes

  37. A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) • Recombinate two randomly chosen parents: Parent 1: 5 7 2 3 8 6 9 1 4 Parent 2: 6 1 2 4 3 5 9 8 7

  38. A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) • Recombinate two randomly chosen parents: Parent 1: 5 7 2 3 8 6 9 1 4 Parent 2: 6 1 2 4 3 5 9 8 7 Offspring:2 9

  39. A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) • Recombinate two randomly chosen parents: Parent 1: 5 7 2 3 8 6 9 1 4 Parent 2: 6 1 2 4 3 5 9 8 7 Offspring:2 9 _ 326 _ 795 _

  40. A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) • Recombinate two randomly chosen parents: Parent 1: 5 7 2 3 8 6 9 1 4 Parent 2: 6 1 2 4 3 5 9 8 7 Offspring:2 9 _ 32 6 _ 795 _ _ 32 687954

  41. A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) • Recombinate two randomly chosen parents: Parent 1: 5 7 2 3 8 6 9 1 4 Parent 2: 6 1 2 4 3 5 9 8 7 Offspring:2 9 _ 32 6 _ 795 _ _ 32 687954 1 32 687954

  42. A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) The next generation is constructed by: 1. Recombinations of 2 randomly chosen parents 2. Improving the E(x) of the offspring by local processing, e.g. by Simulated Annealing 3. Choose the best individuals from the pool of parents and children

  43. Spectral Sequencing :A global minimizer

  44. Spectral Sequencing: a global minimizer • Given a weighted graph where wij is the edge weight between the nodes i and j • Define the graph Laplacian A to be aij = -wij aii = Sjwij • A is symmetric semipositive definite • Consider the eigenvalue problem Ax=lx • Arranging the nodes of the graph according to the eigenvector associated with the 2nd smallest eigenvalue has been shown by Hall (1970) to be the solution to the problem minSjwij(xi - xj)2 for real variables x

  45. Spectral Sequencing: a global minimizer • SS has been used extensively to solve a large variety of ordering problems: • Linear ordering problem: P=1,2, • Partitioning problems • Embedding to lower dimensions, etc. • To calculate the eigenvectors use multilevel • The direct use of multilevel to solve the original problem produces better results than using the ordering dictated by SS

  46. P=2: Multilevel approach vs. Spectral method ratio graphs The results of the multilevel approach were obtained without post-processing! Ilya Safro, Dorit Ron, A. Brandt: J. Graph Alg. Appl. 10 (2006) 237-258

More Related