1 / 30

GA Theory ?

Mathematical Models of GAs Notes from * Chapter 4 of Mitchell’s An Intro. to GAs * Neal’s Research CS 536 – Spring 2006. Why GA theory? Because the GA is a black box. Serious/Organized GA theory research is relatively new FOGA 1 in 1990

tayten
Download Presentation

GA Theory ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mathematical Models of GAsNotes from * Chapter 4 of Mitchell’s An Intro. to GAs * Neal’s ResearchCS 536 – Spring 2006

  2. Why GA theory? Because the GA is a black box. Serious/Organized GA theory research is relatively new FOGA 1 in 1990 GAs became fairly popular in the 80s (first ICGA in 1985) Note that mathematical theory of biological evolution has been around since at least 1916 (first issue of Journal of Genetics) Various High-level classifications of EA theory exist, here is one taxonomy: Schema Thoery Markov Models Vose Models Statistical Mechanics Perturbation Models There is No Free Lunch GA Theory ?

  3. No Free Lunch No Free Lunch for Genetic Algorithms. Given any two optimization algorithms, their performance is exactly equal over the total space of all possible functions to be optimized. Not Possible to make statements like “my mousetrap is provably better then yours for all mice.”

  4. One of the first descriptions was by Nix and Vose. Builds a probabalistic model of GA behavior. U is Z x Z matrix of where For 10 bits and 10 individuals that's 3 * 1023 Population at time t+1, p(t+1) is defined by p(t+1) = U * p(t) Markov Models of GAs

  5. 2-bit mutation only GA, single genome, mu = 0.1 Single Individual Example Markov Model

  6. A group of researchers from University of Dortmund are actively researching simple EAs and building Markov models of them. The (1+1) EA 1) Choose mutation rate pmε (0, ½] 2) Choose x ε {0,1}n uniformly at random. 3) Create y by flipping each bit x independently with pm 4) If f(y) >= f(x),setx := y 5) Continue at line 3 The preceding slide is a (1,1) EA or Random Walk (Drunkards Walk) Dortmund Models

  7. Example (1+1) EA Model

  8. This is a modified (1+1) EA with Metropolis selection The (1+1) Metropolis EA 1) Choose mutation rate pmε (0, ½] 2) Choose αε (1, ∞] 3) Choose x ε {0,1}n uniformly at random. 4) Create y by flipping each bit x independently with pm 5) With If f(y) >= f(x), set x := y 6) Else set x:=y with probability 1/ α f(x)-f(y) 7) Continue at line 3 This EA accepts 'worsenings' with some (usually small probability) If alpha is dependent on t (non-constant) it is a simulated annealing algorithm Metropolis Selection

  9. Metropolis EA Model

  10. (1+1) EA with Cyclic Mutation • This is a modified (1+1) EA with Cyclic mutation operator The (1+1) EA 1) Choose mutation rate pmε (0, n] 2) Choose x ε {0,1}n uniformly at random. 3) Create y by flipping each bit x independently with pm 4) If f(y) >= f(x), set x := y 5) pm := 2 * pm if pm > 1/2, set pm := 1/n 6) Continue at line 3 Note: This EA is a provably better performer on some fitness functions than the classic (1+1) EA.

  11. This is a simple steady-state GA with crossover with the smallest possible population. The (2+1) EA w/ crossover 1) Choose mutation rate pmε (0, ½] 2) Choose Population P :={x,y} where x,y ε {0,1}n uniformly at random. 3) Search With prob 1/3, z is created by mutate(x) With prob 1/3, z is created by mutate(y) With prob 1/3, z is created by mutate(crossover(x,y)) 4) Create P := {x,y,z} – {a} where a be the worst fitness individual. 5) Continue at line 3 (2+1) EA with Crossover

  12. Expected running time of (1+1) EA on binary functions is AT MOST nn Expected running time of (1+1) EA on ONEMAX O(n ln n) Expected running time of (1+1) EA on binary functions is O(4n log n) The (2+1) Crossover-EA can outperform the (1+1) EA [or (2+1) EA] on some Royal Road functions. Proofs

  13. Monday: 2-Armed Bandit Go over Schema Theorem Talk about Royal Road Functions Talk about Vose’s Infinite Population Model Part 2 - Monday

  14. John Holland’s invention of GAs: Meant as implementation of a proposed general principle for adaptation in complex systems: Adaptation requires the correct balance between “exploitation” and “exploration” “Exploitation” Adaptation consists of spreading useful traits once they are discovered “Exploration” Adaptation also consists of “searching” for new useful traits Exploitation vs. Exploration

  15. You are given n quarters to play with, and don’t know the average payoffs of the respective arms. What is the optimal way to allocate your quarters between the two arms so as to maximize your earnings (or minimize your losses) over the n arm-pulls ? What is the relationship to schemas? Two-Armed Bandit Problem

  16. Two-Armed Bandit problem • Slot machine has two arms, A1and A2, with mean payoffs m1 and m2 , with variancess1ands2. • Payoff processes stationary and independent • Gambler given N coins. Goal: maximize payoff. Doesn’t know ms or ss. Must estimate by playing coins on arms. • What is optimal strategy for allocating trials to arms? • Needs to both gather information and use it at the same time. • “On-line learning”: Payoff at each trial counts in performance.

  17. In Holland’s theory, each arm roughly corresponds with a possible “strategy” to test. The question is, if one strategy (arm) seems good, how much time should you spend exploiting it, and how much time should you spend exploring other, possibly better, strategies? Holland claims the GA explores schemata via ‘implicit parallelism’ How does this relate to Schemas?

  18. Chalkboard Discussion of Schema Theorem • Example Schema • Schema Theorem • Simplified Version • Interpretation • Counter Example • Building Block Hypothesis

  19. Critiques Schema Theorem • Muhelenbein: “..the Schema Theorem is almost a tautology, only describing proportional selection..” • A bit unfair.. Nothing has appeared to challenge the mathematics.. only the assumptions are challenged. • Vose showed that a tiny change in the mutation rate can cause a large change in the GAs trajectory. • Butterfly effect – hallmark of non-linear dynamic systems • Stochastic & dynamic nature of equation is ignored, pushing equation beyond sustainability • See board • Fundamentally NOT predictive of real GA behavior. • Must track all schemata to make real predictions on trajectory

  20. Royal Road Function

  21. More Critiques Schema Theorem • Rudolph showed 2-armed-bandit analogy fundamentally breaks down. Holland’s ‘optimal’ strategy is far outperformed by an alternative approach • Macready & Wolpert use a Bayesian framework to argue that the ‘optimal strategy’ is no optimal. Even if we accept that the GA obeys the ‘exponentially increasing trials’ of the theorem, this is NOT the optimal way to solve a competition between schemata. • The assumption that hyperplane/schemata competitions can be isolated and solved independently is false. • Building Block Hypothesis: failed to predict performance on Royal Road functions.. Outperformed by (1+1) ie no crossover. • Niether the S.T. or the BBH are recursive equations that can be iterated and solved like they have been. They are ‘expectations’.. Ie stochastic/random.

  22. Randomly initialize population Select two individuals from population via selection function Combine individuals via crossover function Mutate child via mutation function Place mutated child into next generation's population Loop @ 2 Until next population is full In the next few slides we will delete step 3 – no crossover Vose's Simple Genetic Algorithm & Model

  23. Discrete Dynamical System (Called a Map) Map input population to output population Population represented as 'vector of proportions' 2-bit genome example: p = (0.1, 0.2, 0.5, 0.2) Size of vector is s = 2d (d is length of binary chromosome string) Elements of vector are in [0,1] and sum to 1 (Simplex Property) Fitness Vector is f = (f(x0), f(x1), ..., f(xs-1)) where f(xk) is fitness of kth individual Vose Infinite Population Model

  24. Vose Infinite Population Model (2)

  25. From Dynamic Systems and GA theory we know: US is a positive matrix: all entries are non-negative. Only one normalized eigenvector is in the simplex (via Perron-Frobenius) Eigenvalues of US are the average fitness of the population given by the corresponding eigenvector. The largest eigenvalue corresponds with the lone eigenvector inside the simplex. Output of G(p) is the 'expected' next population of a real GA with a very large population Fixed point is the 'expected' long term population of a real GA with very large number of generations Vose Infinite Population Model (3)

  26. Example: f(00)=3 f(01)=2 f(10)=1 f(11)=4 q=0.1 S = 3 0 0 0 U = 0.81 0.09 0.09 0.01 US = 2.43 0.18 0.09 0.04 0 2 0 0 0.09 0.81 0.01 0.09 0.27 1.62 0.01 0.36 0 0 1 0 0.09 0.01 0.81 0.09 0.27 0.02 0.81 0.36 0 0 0 4 0.01 0.09 0.09 0.81 0.03 0.18 0.09 3.24 Fixed-points are population vectors such that: p = G(p) Eigenvectors of US Eigenvalues of US (0.736 0.155 0.105 0.665) 3.29 (0.779 0.205 0.108 -0.092) 2.48 (-0.299 1.601 -0.145 -0.1559) 1.53 (-0.060 0.023 1.0769 -0.040) 0.78 Vose Infinite Population Model (4)

  27. Results of Markov Chain ModelNix & Vose, 1991 • Nix and Vose used the theory of Markov chains to show: • For large n, trajectories of the Markov chain converge to iterates of G (infinite population model) with probability arbitrarily close to 1. • For large n, if G has a single fixed point, the GA asymptotically spends all of its time at that fixed point. • Extended models (Vose, 1993; 2001): • Short-term GA behavior: dominated by initial population • Long-term GA behavior: determined only by structure of the GA surface

  28. Problems with Exact Models • In principle, can be used to predict every aspect of GA behavior. • In practice? • Required matricies are intractably large • View is too microscopic • Need to reduce dimensionality

More Related