1 / 35

Brief introduction to genetic algorithms and genetic programming

Brief introduction to genetic algorithms and genetic programming. A.E. Eiben Free University Amsterdam. Genetic algorithm(s). Developed: USA in the 1970’s Early names: J. Holland, K. DeJong, D. Goldberg Typically applied to: discrete optimization Attributed features: not too fast

Download Presentation

Brief introduction to genetic algorithms and genetic programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brief introduction togenetic algorithms andgenetic programming A.E. Eiben Free University Amsterdam

  2. Genetic algorithm(s) • Developed: USA in the 1970’s • Early names: J. Holland, K. DeJong, D. Goldberg • Typically applied to: • discrete optimization • Attributed features: • not too fast • good solver for combinatorial problems • Special: • many variants, e.g., reproduction models, operators • formerly: the GA, nowdays: a GA, GAs A.E. Eiben, GAs and GP 2 EvoNet Summer School 2002

  3. Representation Phenotype space Genotype space = {0,1}L Encoding (representation) 10010001 10010010 010001001 011101001 Decoding (inverse representation) A.E. Eiben, GAs and GP 3 EvoNet Summer School 2002

  4. GA: crossover (1) Crossover is used with probability pc • 1-point crossover: • Choose a random point on the two parents (same for both) • Split parents at this crossover point • Create children by exchanging tails • n-point crossover: • Choose n random crossover points • Split along those points • Glue parts, alternating between parents • uniform crossover: • Assign 'heads' to one parent, 'tails' to the other • Flip a coin for each gene of the first child • Make an inverse copy of the gene for the second child A.E. Eiben, GAs and GP 4 EvoNet Summer School 2002

  5. GA: crossover (2) A.E. Eiben, GAs and GP 5 EvoNet Summer School 2002

  6. GA: mutation Mutation: • Alter each gene independently with a probability pm • Relatively large chance for not being mutated (exercise: L=100, pm =1/L) A.E. Eiben, GAs and GP 6 EvoNet Summer School 2002

  7. Crossover OR mutation? • Decade long debate: which one is better / necessary / main-background • Answer (at least, rather wide agreement): • it depends on the problem, but • in general, it is good to have both • both have another role • mutation-only-EA is possible, xover-only-EA would not work A.E. Eiben, GAs and GP 7 EvoNet Summer School 2002

  8. Crossover OR mutation? (cont’d) Exploration: Discovering promising areas in the search space, i.e. gaining information on the problem Exploitation: Optimising within a promising area, i.e. using information There is co-operation AND competition between them Crossover is explorative, it makes a big jump to an area somewhere “in between” two (parent) areas Mutation is exploitative, it creates random small diversions, thereby staying near (i.e., in the area of ) the parent A.E. Eiben, GAs and GP 8 EvoNet Summer School 2002

  9. Crossover OR mutation? (cont’d) • Only crossover can combine information from two parents • Only mutation can introduce new information (alleles) • Crossover does not change the allele frequencies of the population (thought experiment: 50% 0’s on first bit in the population, ?% after performing n crossovers) • To hit the optimum you often need a ‘lucky’ mutation. A.E. Eiben, GAs and GP 9 EvoNet Summer School 2002

  10. 1/6 = 17% B fitness(A) = 3 A C fitness(B) = 1 2/6 = 33% 3/6 = 50% fitness(C) = 2 Selection • Main idea: better individuals get higher chance • 2ndary idea: chances proportional to fitness • Implementation: roulette wheel technique • Assign to each individual a part of the roulette wheel (“unfair”: size proportional to its fitness) • Spin the wheel n times to select n individuals (fair) A.E. Eiben, GAs and GP 10 EvoNet Summer School 2002

  11. fi f Selection (cont’d) Fitness proportional selection (FPS): Expected number of times fi is selected for mating is: . Disadvantages: • Outstanding individuals take over the entire population very quickly  danger for premature convergence. • Low selection pressure when fitness values are near each other. • Behaves differently on transposed versions of the same function. A.E. Eiben, GAs and GP 11 EvoNet Summer School 2002

  12. Selection (cont’d) Tournament selection: • Pick k individuals randomly, without replacement • Select the best of these k comparing their fitness values • k is called the size of the tournament • selection is repeated as many times as necessary A.E. Eiben, GAs and GP 12 EvoNet Summer School 2002

  13. Generational GA reproduction cycle • Select parents for the mating pool • (size of mating pool = population size) • Shuffle the mating pool • For each consecutive pair apply crossover with probability pc • For each new-born apply mutation (bit-flip with probability pm) • Replace the whole population by the resulting mating pool A.E. Eiben, GAs and GP 13 EvoNet Summer School 2002

  14. Generation t Mating pool Generation t+1 child1(2,4) mut(child2(2,4)) string1 string2 string2 string4 string2 mut(string1) string2 string3 … string4 string1 … … Generational GA reproduction cycle • Notes: • Offspring can be: clone, pure mutant, pure crossing, mutated crossing • Generational replacement: whole population deleted/replaced • To be discussed: no survival of the fittest here? A.E. Eiben, GAs and GP 14 EvoNet Summer School 2002

  15. An example after Goldberg ‘89 (1) • Simple problem: max x2 over {0,1,…,31} • GA approach: • Representation: binary code, e.g. 01101  13 • Population size: 4 • 1-point xover, no mutation (just an example!) • Roulette wheel selection • Random initialisation • One generational cycle with the hand shown A.E. Eiben, GAs and GP 15 EvoNet Summer School 2002

  16. An example after Goldberg ‘89 (2) A.E. Eiben, GAs and GP 16 EvoNet Summer School 2002

  17. An example after Goldberg ‘89 (3) A.E. Eiben, GAs and GP 17 EvoNet Summer School 2002

  18. The simple GA • Has been subject of many (early) studies • Is often used as benchmark for novel GAs • Shows many shortcomings, e.g. • Representation is too restrictive • Mutation & crossovers only applicable for bit-string & integer representations • Selection mechanism sensitive for converging populations with close fitness values • Generational population model can be improved with explicit survivor selection A.E. Eiben, GAs and GP 18 EvoNet Summer School 2002

  19. Genetic programming • Developed: USA in the 1990’s • Early names: J. Koza • Typically applied to: • machine learning tasks • Attributed features: • competes with neural nets and alike • slow • needs huge populations (thousands) • Special: • non-linear chromosomes: trees, graphs • mutation possible but not necessary (disputed!) A.E. Eiben, GAs and GP 19 EvoNet Summer School 2002

  20. Motivation Why introduce yet another EA? Reasons: • Elements of a search space may vary in length • Linear representation may be too ‘unnatural’ • Complex variable hierarchy can not be (easily) mapped on linear structures Example search space: • Graphs without restriction on size and structure Because fixed length linear representations are too rigid A.E. Eiben, GAs and GP 20 EvoNet Summer School 2002

  21. Credit score example (1) • Given: lot of historical data on: • customer profile and • creditability index (good/bad). • Needed: a model that classifies good customers. (to be used for evaluating loan applicants) Data description for customer profiles. A.E. Eiben, GAs and GP 21 EvoNet Summer School 2002

  22. AND = > NOC 2 S 80000 Credit score example (2) • A possible model for classification: IF(NOC = 2) AND (S > 80000)THENgood • In general: IFformulaTHENgood. • Need to find the right formula. • Natural representation of formulas is: parse trees • Natural fitness of models: percentage of well classified cases. A.E. Eiben, GAs and GP 22 EvoNet Summer School 2002

  23. GP: representation • Problem domain: modelling (forecasting, regression, classification, data mining, robot control). • Fitness: the performance on a given (training) data set, e.g. the nr. of hits/matches/good predictions • Representation: implied by problem domain, i.e. individual = model = parse tree • parse trees sometimes viewed as LISP expressions  GP = evolving computer programs • parse trees sometimes viewed as just-another-genotype  GP = a GA sub-dialect A.E. Eiben, GAs and GP 23 EvoNet Summer School 2002

  24. GP: mutation • Replace randomly chosen subtree by a randomly generated (sub)tree A.E. Eiben, GAs and GP 24 EvoNet Summer School 2002

  25. GP: crossover Exchange randomly selected subtrees in the parents A.E. Eiben, GAs and GP 25 EvoNet Summer School 2002

  26. GP: selection • Standard GA selection is usual • Sometimes overselection to increase efficiency: • rank population by fitness and divide it into two groups: • group 1: best c % of population • group 2: other 100-c % • when executing selection • 80% of selection operations chooses from group 1 • 20% from group 2 • for pop. size = 1000, 2000, 4000, 8000 the portion c is c = 32%, 16%, 8%, 4% • %’s come from rule of thumb A.E. Eiben, GAs and GP 26 EvoNet Summer School 2002

  27. Generating random trees Given a: • Function set F and a • terminal set T , • both satisfying the closure property. Trees are randomly generated by: • Full method: Each branch is of length Dmax (pre-specified), nodes with depth < Dmax are from F nodes with depth = Dmax are from T • Grow method: maximum branch length is Dmax (pre-specified) • Ramped half-and-half: for each D  Dmax an equal nr. of trees • half of them with full method • half of them with grow method A.E. Eiben, GAs and GP 27 EvoNet Summer School 2002

  28. Mutation of trees • Replace randomly chosen subtree by a randomly generated (sub)tree. A.E. Eiben, GAs and GP 28 EvoNet Summer School 2002

  29. Crossover of trees Exchange randomly selected subtrees in the parents A.E. Eiben, GAs and GP 29 EvoNet Summer School 2002

  30. Standard parameters in GP (1) Qualitative variables Initialisation: ramped half-and-half. Fitness: adjusted fitness is used. Selection: • fitness proportionate, • elitist strategy is not used, • over-selection is used for populations of M  1000. Over selection for population size = 1000: • rank population by fitness and divide it into two groups: • group 1: best c = 32% of pop, group 2 other 68% • 80% of selection operations chooses from group 1, 20% from group 2 • for pop. size = 2000, 4000, 8000 the portion c is c = 16%, 8%, 4% • motivation: to increase efficiency, %’s come from rule of thumb A.E. Eiben, GAs and GP 30 EvoNet Summer School 2002

  31. Standard parameters in GP (2) Major numerical parameters • Population size M = 500. • Maximum number of generations G = 51. Minor numerical parameters • Probability pm of mutation = 0% !!! • Probability pr of reproduction = 10% • Probability pc of crossover = 90% • Probability pip of choosing internal points for xover = 90% • Maximum size Di for initial random S-expressions = 6 • Maximum size Dc for S-expressions during the run = 17 … and some “exotic” options usually set at 0 (e.g. permutation, editing, encapsulation, decimation) A.E. Eiben, GAs and GP 31 EvoNet Summer School 2002

  32. Simple symbolic regression (1) Given a number of sample points in 2: (x1, y1), (x2, y2), … , (xn, yn) Find a one-dimensional numerical function f(x): i  {1, …, n} : f(xi) = yi In the present test 20 sample points are generated by: x4 + x3 + x2 + x A.E. Eiben, GAs and GP 32 EvoNet Summer School 2002

  33. Simple symbolic regression (2) Specification of the GP for the symbolic regression problem A.E. Eiben, GAs and GP 33 EvoNet Summer School 2002

  34. Simple symbolic regression (3) Graphical representation of an individual (and the benchmark formula x4 + x3 + x2 + x) A.E. Eiben, GAs and GP 34 EvoNet Summer School 2002

  35. + X • + X X • • X + X X - cos - - X X X X Simple symbolic regression (5) Best individual representing a perfect solution: X +(X • (X +(X • (X • (X +(COS(X - X ) - (X - X))))))) = X + (X • (X + (X • (X • (X + 1))))) = X + X4 + X3 + X2 A.E. Eiben, GAs and GP 35 EvoNet Summer School 2002

More Related