Just what are “building blocks”? How do (should) Evolutionary Algorithms “work”?

Just what are “building blocks”?How do (should) Evolutionary Algorithms “work”? Chris Stephens and Jorge Cervantes, Instituto de Ciencias Nucleares, UNAM FOGA 2007, 9/1/2007 stephens@nucleares.unam.mx

It’s mathematically rigorous It’s intuitive Theory It’s useful for practitioners It’s exact What should it do? It unifies phenomena It predicts well

“Old” Schema Theory and the BBH Statistical Mechanics Approach Theory Dynamical Systems Model Engineering “Rules of thumb” What’s the best approach? Coarse Grained models Population Biology Models

The Problem of Theory… Theory Experiment The “ideal”

The Problem of Theory… ? ? In EC … ? ? New Applications New Algorithms Theory Experiment e.g. Multi-Resource Traveling Gravedigger Problem with Variable Coffin Size “Most algorithms are NEVER used (except by the people who created them)” - Darrell Whitley, GECCO 2003 tutorial

The Problem of Theory… The EC Expectation Gap What theoreticians think practitioners are and what practitioners think theoreticians should be What practitioners think theoreticians are and what theoreticians think practitioners should be

EC Theory – the “Bare Necessities”- the choice of representation GP GAs ? (1,0,0) “Objects” Dim = |X| z Linear GP Variable-length GAs (1.321,2.463,3.149) y … ES x

EC Theory – the “Bare Necessities” Objects have fitness: Objects have interactions: f  Selection Object Mutation + + Recombination k m – recombination “mode” Dynamics i j

In mathematics… Finite population model determined by Markov chain. In the infinite population limit for haploids: That’s most of standard population genetics and evolutionary computation! Implicit summation over repeated indices Probability to mutate genotype J to genotype I Probability to implement recombination Probability that given recombination takes place it is implemented with mode m Probability to select genotype I Conditional probability for “child” J given “parents” K and L and a mode m

Select two “parents” K and L Don’t recombine it with another Select an object J Recombine them with respect to a recombination mode m applied with probability pcpc(m) to obtain a “child” J Mutate it to object I • Ω coupled non-linear difference equations • There are Ω3 different λJKL • Most of them are zero • In object/string basis for a given m more than one K and L can give • rise to J • Equation is written covariantly (in terms of tensors) and • therefore is valid in any coordinate system

Two Questions… • Can we understand anything “qualitatively” from them? How does genetic dynamics “work”? (Why and when are recombination and mutation useful?) What are the effective degrees of freedom/collective modes? • Can we “solve” them? Put them on the computer. Not very feasible for N = 100!

Can we make things simpler?- consider only one operator… • Selection only – can get exact solution in terms of “objects”, e.g. strings (microscopic degrees of freedom are good “coordinates” for selection) • Mutation only – can get exact solution by Fourier transforming (coordinate transformation to the Walsh/Fourier basis); Diagonalizes the mutation matrix - solutions are “normal modes” (collective/effective degrees of freedom) Can answer both 1) and 2) in these cases But what about recombination?

Holland´s Schema theorem for schemata of length l and order Nm • Consider schemata/marginals and neglect the construction term Smaller for longer schemata Tight linkage beneficial because tightly linked genes are more likely to crossover together Smaller for higher order schemata Bigger for fitter schemata Dynamic schema fitness is population dependent a a a a a a a

“building block” Hypothesis: A GA works by combining short, low-order, highly fit schemata (“building blocks”) into fitter higher order schemata • But how would we recognise one if we saw one? • Building what? • How many of them are there? • Just how are they combined together? • When is recombination beneficial? • How does the effect of recombination depend • on the fitness landscape (and on other • operators/parameters)?

Fitness landscape “linkage” Loosely linked epistatic genes Tightly linked epistatic genes Understand the “linkage” (epistatic) patterns of the fitness landscape (linkage learning) a a a a a a a Create a representation so that epistatic genes are tightly linked Epistatic genes But… a What is the relationship between “landscape blocks” and “building blocks”? a a a

Does recombination favour tight linkage? Perform a “coarse graining” (i.e. write it in terms of schemata) of the RHS of the exact microscopic equations or, equivalently, do a linear coordinate transformation using Selection-weighted linkage disequilibrium coefficient Depends on population state, fitness landscape and recombination distribution Gives a complete description of the utility of recombination mode by mode and generation by generation

Building Block schemata • Object/string construction is now written in terms of schemata/marginals • - Building Block schemata • These BBs are not the same as those of the “building block” • hypothesis – they are not necessarily short or low-order or even fit! • For every recombination mode/channel there is a corresponding • unique BB pair • The number of BB schemata is precisely defined (e.g. 2N for • binary strings) • They form a coordinate basis (many in fact, one for ech object) • Hierarchical solutions – objects have BBs, these BBs have their BBs etc. • Hierarchy can be represented diagramatically This is how recombination “works” For a given object/schema it specifies the ONLY ways it can be built

Recombination via a particular channel increases/decreases the proportion (effective fitness) of a given string or schemata I when < 0 > 0 Favours “loose linkage” respectively Favours “tight linkage” If < 0, “channel” is “non-deceptive” higher probability to select the Building Blocks of the string/schemata than the string/schemata itself If > 0 , “channel” is “deceptive” lower probability to select the Building Blocks of the string/schemata than the string/schemata itself Standard Two-bit deception: f(0*) > f(1*) > 0 i.e. > 0

Example: three loci, 1-point crossover Level 1 BBs – BBs of the string (e.g. optimum) Level 2 BBs – BBs of the BBs Level 3 BBs – BBs of the BBs of the BBs – there aren’t any, hierarchy terminates at O(1) BBs

Landscape blocks Modular landscapes: m=1 NIAH m=N “counting ones” f_0=0, Royal Road function Concatenated traps Useful metrics: Compares the relative effects of two operator sets; e.g. recombination and selection vs selection only, or recombination and selection vs selection and mutation

What can theory tell us about selecto-recombinative EAs?

Predictions First, the obvious – if a string or schema does not exist in the population then If it does exist then there exists a critical proportion for any string/schema such that if and hence recombination is “bad”, where is population, mask/mode and landscape dependent To see interaction between biases of selection and recombination consider a random population, then

Predictions For 1-block NIAH, N=4; only one landscape block and (true for any mask) Recombination is disadvantageous for all masks For 4-block NIAH, N=4; maximum number of landscape blocks (true for any mask) Recombination is advantageous for all masks For 2-block NIAH, N=4; intermediate number of landscape blocks the relative advantage of recombination is mask dependent 0011 is compatible with the landscape blocks but 0001 isn’t

Predictions • Only in “extreme” cases can you say whether recombination is uniformly good or bad • The more/less epistatic/”unmodular” the landscape the worse/better the effect of recombination • Better to ask which recombination distribution is good or bad • Which recombination distribution is best depends on the landscape • The best recombination distributions are those whose BBs are compatible with the landscape´s blocks, i.e. the underlying modularity • Also depends on the population and therefore should be time dependent (first search with very mixing recombination to explore for blocks then restrict the mixing to exploit them)

When is recombination bad? Lower order BBs preferred Shorter BBs preferred Recombination leads to LESS production of the optimal string or ANY optimal BB or schemata than selection only

When is recombination good? Preference for O(1) BBs near the string boundary Higher order BBs/schemata preferred Longer BB/schemata preferred Recombination leads to MORE production of ANY optimal string or optimal BB or schemata than selection only

And what about here? Preference for O(1) BBs near the string boundary Recombination favours longer optimal schemata; But these aren’t BBs! This level 2 O(2) BB is favoured These BBs are only favoured after a certain amount of time. These level 1 O(2) BBs are suppressed So, is recombination good or bad?

So, what do the Deltas tell us? Recombination is particularly bad in trying to construct these O(2) BBs/optimal schemata – because of their tight linkage! masks Better to construct the needle with these masks than these; asymmetric BBs preferred Recombination is better constructing these O(2) optimal schemata – because of their loose linkage! But they’re not BBs! Recombination is bad for ANY mask… but some masks are worse than others!

So, what do the Deltas tell us? Better to construct the optimum with these masks than these – “symmetric” BBS preferred Recombination is particularly good in trying to construct these O(2) BBs – because of their tight linkage! Recombination is good for ANY mask… but some masks are better than others!

So, what do the Deltas tell us? Splitting up landscape blocks that are also BBs is very BAD Getting the optimum from recombining BBs that aren’t landscape blocks isn’t good Note: no sign changes Getting the optimum from recombining BBs that are also landscape blocks is good. Preference for the mask 0011, the only one that respects the landscape blocks Recombination is good for SOME masks but BAD for others, and this depends on the landscape!

And for finite populations…?

2-point crossover, popsize = 13, 1000 repetitions The more crossover the better it gets! The hard part here is to find the BBs in the first place. Lots of crossover helps with that.

Better to cut at block boundaries Lots of crossover gives random search (or worse) “2-point” crossover, popsize = 13, 100 repetitions Here mutation first finds the blocks then crossover joins them together

“2-point” crossover, popsize = 25, 100 reps Mutation is bad – once you’ve got the BBs – easier to get O(1) BBs!

Conclusions • Recombination works by joining together BBs (not the BBH ones!) – that’s the only way it works • Objects have BBs which have their BBs which … • BB basis is the appropriate mathematical description of recombination along with the SWLD coefficients • Can glean qualitative information from the infinite population equations that is also valid for finite populations • Recombination is only absolutely good or bad in the extreme siutations of maximum and minimum epistasis, and even then it’s good if you don’t have the string/schema you want • In other cases it depends on the fitness landscape and especially it’s modularity • It seems to be particularly beneficial in “modular” landscapes

Conclusions • Instead of asking if recombination is good or bad better to ask what is a good recombination distribution • If recombination distributions are allowed to evolve they will do so to respect landscape modularity • Possible explanation for recombination hotspots • Coevolution of recombination hotspots and modular landscapes • Remember that a gene is a “building block”, O(1) in terms of “loci” but O(thousands) in terms of nucleotides • Modularity can be lots of intragene epistasis but weak intergene epistasis • Difference between “counting ones” (nucelotides) versus “counting ones” (genes)

Just what are “building blocks”? How do (should) Evolutionary Algorithms “work”?