370 likes | 598 Views
Lidia Yamamoto and Manolis Sifalakis University of Basel http://cn.cs.unibas.ch. CS321 HS 2009 Autonomic Computer Systems Evolutionary Computation I November 17, 2009. Overview. Part I (today) Genetic Algorithms (GA) Genetic Programming (GP) Part II (Thursday) Representations
E N D
Lidia Yamamoto and Manolis Sifalakis University of Basel http://cn.cs.unibas.ch CS321 HS 2009Autonomic Computer SystemsEvolutionary Computation INovember 17, 2009
Overview • Part I (today) • Genetic Algorithms (GA) • Genetic Programming (GP) • Part II (Thursday) • Representations • Dynamic environments
Overview of Today’s Lecture • Evolutionary Computation, Part I • Introduction • (Self-)Optimization • Basic definitions from genetics • Evolutionary Computation • Common definitions, basic algorithm • Genetic Algorithms (GA) • Genetic Programming (GP) • Example: Symbolic regression with TinyGP
Textbooks • Melanie Mitchell, “An Introduction to Genetic Algorithms”, MIT Press, 1998. • W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. "Genetic Programming, An Introduction". Morgan Kaufmann Publishers, Inc., 1998. • R. Poli, W. B. Langdon, and N. F. McPhee. "A Field Guide to Genetic Programming". Published via http://lulu.com, 2008. http://www.gp-field-guide.org.uk/.
Optimization and Self-Optimization • Optimization (Operations Research) • maximize/minimize objective function subject to constraints • search for a solution in a vast search space or solution space (space of all possible solutions) • linear/non-linear • Self-Optimization (IBM Autonomic Computing) • ability of the system to optimize its operations based on a given target operation profile (objectives and constraints) • system continually seeks to improve its performance and efficiency, in order to meet end-users' needs with minimal human intervention • also able to track and respond to profile changes
Optimization (Operations Research) • General formulation of an optimization problem: • f(x) = objective function • gi(x) = constraints • Simple example: 1 variable (x), no constraints maximize: subject to: global optimum local optima f(x) search space x best solution
Heuristic Optimization • Search space often discrete, and too large for exhaustive search • Heuristic optimization seeks to explore promising regions of the search space through a beamed search: • refinement of promising solutions seen so far • do not guarantee that a global optimum will be found • might be trapped in local optima • goal is to provide a satisfactory solution in reasonable time • Some heuristic optimization algorithms: • Evolutionary algorithms: Inspired by genetics and Darwinian evolution • Swarm algorithms: Inspired by the behavior of social insects (ants, bees...), flocks of birds, fish, etc.
Evolutionary Computation • Heuristic optimization method • Beamed search in a vast space of possible solutions • Inspired by Darwinian evolution: • Biological evolution by natural selection • Survival of the fittest • Advantages: • Simplicity • Potential parallelism • Able to work with limited information: only fitness improvement • Disadvantages: • Computation cost (e.g. large populations, complex fitness) • No guarantees (like all heuristics)
An Eukaryotic Cell • (Prokaryotic = without nucleus, e.g. bacteria) • Eukaryotic = with nucleus, e.g. plant and animal cells membrane cytoplasm nucleus genome (genetic material) organelles and other cell components (e.g. mitochondria, ribosomes...)
The Genome • Each cell nucleus contains one or more chromosomes • A chromosome contains a number of genes • A gene is a region of genomic (DNA) sequence defining a functional block • Chromosomes may occur in one or more copies within a cell: • Diploid cell: has two copies of each chromosome (e.g. in humans and most animals) • Haploid cell: only one copy (e.g. gametes) • Polyploid cell: several copies (typically a power of 2, e.g. tomatoes, crops)
The Genome DNA chromosome pair (diploid organism) double helix cell nucleus A T G C gene chromosomes chromosomes genes
The Genome • A gene is the basic unit of heredity in a living organism • region of genomic sequence • can be seen as a functional block • typical encodes a protein, which leads to a trait, e.g. eye color • Locus is the position of a gene in the chromosome • Allele: each possible alternative DNA sequence for a given gene locus (e.g. leading to different traits such as red or white flowers)
The Genome • Genotype: the genetic material of an individual • Phenotype: the ensemble of observable traits (e.g. flower color, leaf shape) resulting from the genotype of an individual allele for white flowers allele for red flowers homologous chromosomes
Recombination and Mutation • The DNA is able to replicate with a high fidelity (with the help of enzymes) • However, mutations (errors in the copy process) might still occur, e.g. due to chemical agents, radiations, etc. • Recombination (crossover) during sexual reproduction: chromosomes swap genes
Darwinian Evolution • Reproduction = replication + (unlimited) heritable variation • Replication of the DNA sequence • Cell replication • Organism reproduction • Variation: mutation, recombination • Fitness = Reproduction rate • how fast an organism (or species) is able to reproduce • Selection: survival of the fittest • exponential growth + finite resources = competition • outcome: competitive exclusion (survival of the fittest)
Overview • Evolutionary Computation, Part I • Introduction • (Self-)Optimization • Basic definitions from genetics • Evolutionary Computation • Common definitions, basic algorithm • Genetic Algorithms (GA) • Genetic Programming (GP) • Example: Symbolic regression with TinyGP
Evolutionary Computation • Genetic Algorithms (GA) • goal: find an optimum solution (e.g. combination of parameters) to an instance of a problem • candidate solutions are typically strings • Genetic Programming (GP) • goal: find an optimum program able to solve any instance of the problem • candidate solutions are programs, e.g. linear (e.g. assembly language), tree (e.g. LISP), graph (dataflow, cartesian GP) • Other variants not covered in this course: • Evolution Strategies (ES), Evolutionary Programming (EP)
Evolutionary Computation: Basic Concepts • Individual: a candidate solution to the problem to be solved • Population: set of candidate solutions • Genotype: (compact) representation of individuals; can be broken down into chromosomes, genes, alleles,... • GA: string of bits, integers, characters, etc., examples: bin: 01101011 hex: 39fe87ac3b46 dec: 2039384757 alpha: addbadbaaccd • GP: program, e.g. linear, tree, graph,... linear: I1: R1 = R2 + R3; I2: R3 = R4 * R1; I3: Jump I2 LISP tree: (* (+ 3 a) (if (< a b) (- x y) (/ x 3)))
Evolutionary Computation: Basic Concepts • Genetic operators: variation functions that transform a • set of individuals (parents) into a new set (offspring) • Common operators: • Mutation: random change in genotype, with low probability • Crossover: recombine portions of two genotypes parent offspring mutation 01010010001 01010101001 offspring 1 parent 1 1011 1100101 1011 011100010 crossover parent 2 crossover point offspring 2 0010 011100010 0010 1100101
Evolutionary Computation: Basic Concepts • Phenotype: (expanded) individual that can be evaluated for fitness (e.g. program) • can be the same as the genotype (direct encoding) • or different: indirect encoding: genotype-phenotype map phenotype genotype if (a > 10) then x = 20 else print b map 0 23 198 54
Evolutionary Computation: Basic Concepts • Fitness: measure of how good a candidate solution is • tested on a number of test cases (training set) • expressed as a fitness function: e.g. error between ideal and obtained solution (on training case); absolute or relative performance measure • Selection strategy: Algorithm that selects individuals in the • population that will build the next generation • Principle: "survival of the fittest": best fit individuals have a higher chance of being selected • Selected individuals undergo variation through genetic operators to form the next generation
Evolutionary Computation: Basic Algorithm • pop = initial population • generation = 0 • evaluate(pop) • while bestfit(pop) not good enough, and not stop: • parents = select(pop) • children = recombine(parents) with probability pr • mutate(children) with probability pm • pop = children + select(pop) • evaluate(pop) • generation = generation + 1
Fitness Landscape • n-dimensional landscape: • fitness function is the objective function: • is the genotype to be optimized • peaks: local optima • 3-D case: intuitive visualization: z = f (x, y) z y x example of an initial population (red dots) on a fitness landscape
Fitness Landscape • Convergence example: z z y y x x
Fitness Evaluation Examples of fitness functions: • Error fitness function: sum of distances (error) between expected (pi) and obtained (oi) solution on training case i out of n cases, for individual p: in this case, best fitness = 0 • Relative performance measure, e.g. success rate (best fitness = 100%) • Absolute performance measure, e.g. amount of credits won, amount of food found (best fitness = infinite)
Selection Strategies • Selection is typically stochastic: "survival of the fittest": • best individuals have a higher chance of being selected • selected individuals become "parents" and produce "offspring” • offspring form the next generation of individuals to be evaluated • Examples of selection strategies: • Fitness-proportional selection (Roulette wheel) • Ranking selection • Tournament selection
Selection Strategies • Fitness-proportional or Roulette Wheel selection: the probability of selection (pi) of individual i (out of a population of m individuals) is proportional to its fitness: • Ranking selection: probability of selection is a function of rank of individual from worst to best fitness. • Tournament selection: a group of randomly selected individuals "compete" in a tournament; winners (best fitness) produce offspring which will replace losers.
Overview • Evolutionary Computation, Part I • Introduction • (Self-)Optimization • Basic definitions from genetics • Evolutionary Computation • Common definitions, basic algorithm • Genetic Algorithms (GA) • Genetic Programming (GP) • Example: Symbolic regression with TinyGP
Genetic Algorithms • Genetic Algorithms: goal: find an optimum solution (e.g. combination of parameters) to an instance of a problem • Became popular via John Holland, 1970s • Example: solving the Travelling Salesman Problem (NP-hard): • given a set of cities and connections (roads) between them, find the shortest tour that visits all cities (and each city only once). • individual = candidate tour (sequence of cities) • fitness = length of tour (the shorter the better) • initial population = set of randomly generated tours • iterate by evaluating, selecting, mutating and recombining tours until stop criterion (e.g. max number of generations,
Sample Iteration Ci = individuals (chromosomes) with two genes: X and Y Fitness function: parents: C4-C3, C4-C1 recombine by swapping genes
Simple GA Example: Discover the hidden sentence • Goal: Find out what's the hidden sentence: e.g. "this is a test" (without blanks); only the length of the target sentence is known to the GA • (generalization of the OneMax problem: maximize the number of ones in a bitstring) • Fitness: Similarity measure between obtained and target string, using a string alignment (edit distance) algorithm : • 100% similarity = identical strings = optimum • Fixed-length genotype (character string) • Mutations: • point mutation (letter replacement, e.g. "ABCD" "ABXD"); • shift (rotate) left/right (e.g. "ABCD" "BCDA") • Two-point crossover • Tournament selection, size 4 (two best remain in population, and their offspring replace the two worst)
Simple GA Example: Typical run > ./strga "thisisatest" gen= 1, fit= -27, best=qthakeqsfzt gen= 2, fit= 9, best=ophrispstet gen= 3, fit= 9, best=ophrispstet gen=10, fit= 45, best=qthisattest gen=15, fit= 81, best=thisisatesc gen=18, fit= 100, best=thisisatest Size of search space=3.670344e+15 Population size=1000 tested=45070 Percent of search space explored: 5.176626e-10 % • Size of search space = number of all possible strings of same length as target (11 characters in example) using alphabet a-z = 2611 possible strings • Typically only a small fraction of (huge) search space is explored • But this problem is quite (too) easy for a GA (smooth fitness landscape)
Search for solutions aggressively • Hill climbing: always chooses best solution so far • Always reaches a hilltop • But maybe not the highest top: can be trapped in a local maximum E.g. Steepest Ascend
Search for solutions with a GA • Solution discovered probabilistically • Problem: Can not guarantee discovery of hilltop • May reach global maxima by wandering through search space GA
GA: Real-World Applications • Numerical and Combinatorial Optimisation • Job-Shop Scheduling, Traveling salesman • Economic • Biding and trading strategies, stock trends • Ecology • Host-parasite co-evolution, resource flow, biological arm races • Population Genetics • Viability of gene propagation • Social systems • Evolution of social behavior in insect colonies • Computer art • Automatic generation of graphics, music
GA for Computer Art • Interactive evolution: user (computer artist) determines fitness • Examples by Karl Sims, http://www.karlsims.com/ Competing virtual creatures 1994 Galapagos (3D animated forms) 1997 Genetic Images 1993
Overview • Evolutionary Computation, Part I • Introduction • (Self-)Optimization • Basic definitions from genetics • Evolutionary Computation • Common definitions, basic algorithm • Genetic Algorithms (GA) • Genetic Programming (GP) • Example: Symbolic regression with TinyGP