Richard A. Watson Natural Systems group ECS, Southampton University, UK

Recombining Building Blocks (again)in simple building block functionsand in natural genomesorHow two scales of optimisation can be better than oneorAnother real royal road Richard A. Watson Natural Systems group ECS, Southampton University, UK

Overview • Work in progress, no clever analysis, just some concepts • Relationship between sex, problem decomposition, coev. • Discussion of concatenated BB functions • Mutation vs crossover in fully-deceptive blocks • When is a block not a block? (when its an atom with a big alphabet) • Two scales of optimisation in partially deceptive subfunctions • What does that mean for a biological model? • Generalisations • Natural properties of genomes

Objectives (wish list) • Understand sufficient conditions (if any) for evolving population to be required to use building block combination • A ‘Royal Road’, based on building blocks, allowing use of linkage information – and, that isn’t too contrived • Specifically, is there a kind of problem space where • GA >> hill climber (same assumptions of epistasis and linkage), • that uses population for combination of building blocks/search combinations of blocks/move in block-sequence space (not just preserving ‘common good’) • And, keep it simple/intuitive!  biologically plausible • e.g. not HIFF (Sudholt), not gap function, not intersecting ridges

Some thoughts about contrivance • In general • Some (diverse) points in genotype space are easy to reach • Recombination of genotypes from these areas results in a jump to a new genotype that is high in fitness, this peak is otherwise unreachable • Why is the peak just where it is? • The building block intuition is that • A-good, B-good→A+B-better • A-easy, B-easy→A+B-easy??

block 1 block 2 ----------- block 1 block 2 ----------- T klogk Intersecting ridges block 1 block 2 ----------- klogk block 1 block 2

fitness k-1 k 0 1 2 … unitation An old favourite: Concatenated trap functions • i.e. • Separable sub-functions • With sub-optimal peak in each sub-function • In much prior work with these it is supposed that: • Selection on bits not sufficient and not necessary • Selection on blocks is necessary and sufficient • precluding all utility of selection on bits, as in fully-deceptive trap functions, maximises the advantage of selecting on blocks (?) e.g.

However • If the selective gradient within a block is not useful, then all means to find a good block are going to be exponential in k. Hence popsize needs to be exp. in k. • Hence assumption that blocks size, k, must be small (constant, not fraction of L) • Afterall, if algorithmic advantage is gained by dividing a problem into pieces, then the smaller the pieces the better, right? • And if it gets too small, so that mutation on bits is sufficient, we can always mislead it → hence, fully-deceptive traps • Exp in k is not a problem if k is small, and • At least its not exponential in L, right? • Well, yes – but this scenario doesn’t require recombination of good blocks… the smaller the blocks, the easier it is to do without BB recombination

Crossover vs mutation without linkage info • Uniform xover = ‘common good’ + macromutation • 111111111111111111000000111111 parent A • 111111000000111111111111111111 parent B • 111111??????111111??????111111 offspring • This utilises information from the population to ‘identify’ which parts to randomise – giving expected time exponential in k (not L), regardless of linkage. • A mutation hill-climber couldn’t do this without linkage information. • But it doesn’t transfer good blocks from one individual to another – there’s no building-block story required • Overall time still exp in k (at best) → requires small blocks.

Crossover vs ‘macromutation’ with linkage info • E.g. two-point xover = ‘common good’ + recombination of segments that disagree • 111111111111111111000000111111 parent A • 111111000000111111111111111111 parent B • 111111AAAAA111111BBBBB111111 offspring • Utilises common good, AND if linkage is tight, expected time to find appropriate crossover points is better than L2. • Does transfer good blocks from one individual to another! • However, if popsize exponential in k, then overall time is still exponential in k (at best) → requires small blocks. • And, if use of linkage information is allowable, macromutation hill-climber is also order L22k • i.e. pick two crossover points, randomise all bits in between.

Reconsider assumptions • Fully-deceptive trap function assumes optimisation at only the block scale is ideal. i.e. • Selection on blocks is necessary and sufficient • Selection on bits not sufficient and notnecessary • After all, to the extent that selection on bits is useful, selection on blocks seems redundant…? That is, if … • selection on bits can find good blocks, • and selection on blocks finds good genotypes then • shouldn’t it be the case that selection only on bits is sufficient to find fit genotypes?

… • In order to show that selection on blocks is required • Selection on bits must be insufficient. • But maybe • Selection on blocks may not be sufficient on its own either, and • Selection on bits might be required too. • Finding utility in selecting on blocks need not preclude some utility in selecting on bits. Can we utilise selection on bits to find good blocks, without precluding utility of selecting on blocks?

fitness k-1 k 0 1 2 … unitation Consider, partially-deceptive sub-function • Hill-climbing (i.e. selection on bits) will reach one or the other optimum in time O(klogk) • likewise, mutation and selection, even in a small population • Prob. 0.5 of reaching high optimum, in each block. • As before, two point recombination can bring good blocks together in time at worst L2. (to find req. crossover points) • There’s no need for any process that’s exponential in k, so we can use large k, without impeding the GA. • k can be a constant fraction of L, and its all still polynomial • (So long as good blocks are maintained in the population) • Whereas hill-climber will take time exponential in L

Likelihood of hill-climber succeeding • If HC doesn’t arrive at AB it will be doomed • But if some indivs arrive at Ab and some arrive at aB, then they can be crossed to find AB • But if you had enough diversity to find both Ab and aB you would have enough diversity to find AB! • But this isnt true for many blocks • the probability of finding B blocks all correct in one individual is exp. small in B • whereas there’s a reasonable probability of having each block correct in at least one individual, even with small popsize • Consider 2 blocks: There are 4 possible results of a local hill climber: ab, Ab, aB, AB.

Given large k (constant fraction of L) • Uniform xover fails, and macromutation (even with the use of linkage information) also fails, • And more to the point, both of these become far removed from performance of two-point crossover with tight linkage, O(L2) • Large k is required to properly separate performance of algs that do not recombine building-blocks from performance of algs that do • In concatenated trap functions with large k • Selection on bits not sufficient but is necessary • Selection on blocks is necessary but not sufficient • Local optimisation at both scales is required

So what? • Partially deceptive trap functions with largek are (in principle) better at distinguishing ability of sexual population from asexual population and hill-climbers • The contribution is more conceptual than technical – • I suggest that thinking of a GA as a mechanism for manipulating blocks, has blinded to importance of also utilising selection on bits • The root of the problem is that a block is not really a block (wrt selection) if there’s no selection ‘inside’ it –its just ‘atom’ with a big alphabet. And then there’s no point to it – fully-deceptive blocks  ‘block-wise one-max’ where each step is exp in k. • But utilising two-scales of optimisation, local optimisation at bit scale (via mutational variation), and local optimisation at block scale (via recombination) can do something interesting • mutation doesn’t merely provide variation for recombination to act on, it provides ability to follow selective gradients in nucleotide sequence space /= selective gradients in allele frequency space

Generalising required properties of intra-block epistasis • Doesn’t have to be as contrived as a ‘trap’ function in the sense of complementary optima • Single-peaked wont do – selection on blocks is not required • Random intra-gene landscapes won’t do – selection on bits is not useful for finding good blocks • NKs won’t do either – direct trade-off between • Utility of local adaptation vs • Non-utility/insufficiency of local adaptation • Necessary and sufficient: multiple peaks, with smooth-ish slopes, with significant separation between them

Does that help with the evolutionary biology perspective of sex? • Genes as blocks • Usually think of blocks meaning groups of ‘genes’ • But the blocks should contain many mutational units (not many genes) • In natural genomes, mutational scale is very different from recombinational units

Assumptions for an epistasis model of a natural genome • Genes contain many nucleotides • Nucleotides within a gene are strongly epistatic • Epistasis between genes is relatively weak • Thus, large disjoint sets of nucleotides are grouped both functionally and physically • Intra-gene epistasis creates multiple local optima (in nucleotide-sequence-space) • That are significantly distant from one another in nucleotide sequence space • That have significantly different fitnesses • Following selective gradients in nucleotide sequence space from a given ancestral sequence will not always result in discovery of the same local optimum

Inter-gene interactions • Assume that the fitness of a genotype, G, will be given by the fitness of each of the genes, g1, g2,…,gB, with no epistasis between genes (i.e. multiplicative fitness): • where f(gi) is the fitness contribution of the ith gene.

e.g. Intra-gene landscape (i.e. one sub-function) • The fitness contribution of each gene, g, will be an epistatic function of some nucleotides that it contains defined using P randomly positioned peaks: • Where bp is a locally optimal sequence, ωp is the height of peak bp, and h(x) defines the shape of the peak

Intra-gene landscape, 2D illustration • Using 10 randomly positioned peaks with heights j=1/(1+j) to create a range of heights.

Intra-gene landscape; A couple of 1D illustrations • Not as contrived as the trap sub-function – its just a multi-peaked sub-function • Hill climbing from a random start position might not always reach the same peak.

Overall landscape • Represented using the product of two 1-D intra-gene landscapes

Simulation parameters • 25 genes of 25 bits each. • 10 randomly positioned peaks in each gene. • “if competing block solutions are maintained long enough”… i.e. first good-ish block doesn’t fix before others have chance to find a better one. • population subdivision • 30 demes of 10 individuals each. Rank selection • Migration rate = 1 individual per deme per 10 generations is a migrant. • (Converged initialisation) • 30 runs per point; ave of ave, and ave of best

Asexuals, to low-rate crossover, to uniform • A low per-locus crossover rate, that can keep nucleotides of a gene together but assort genes, is preferred over asexuals (1000 fold) and over free recombination of nucleotides (10000 fold). Fitness after 1000 generations Crossover rate

Block-wise crossover, and shuffled control Fitness after 1000 generations Linkage model

Subdivision not essential

Mutation sensitivity (block-wise crossover)

observations • Crossover enables large specific changes in genotype space • Not small ones • Not random ones • Low mutation wont do • High mutation wont do • Both is OK • in two phases • Still needs linkage info (nearest basin boundary is k/2 bits away, if mutations affect many blocks, it’ll trade improvements for steps backward) • Q what can a GA do that MMHC with two scales of mutation cannot? • Do the things that are put together _need_ to be found in parallel? need • Gap function!

Conclusions • Simple concat BB functions DO show adv of recombining blocks if blocks are large, not fully deceptive, and tightly linked • Has ‘natural’ interpretation • If you don’t need/cant use selection any selective gradients on parts within a block, then finding a block takes time exp in blocksize, → blocks must be small • Although dividing a problem into small subproblems seems like a good idea, the other way to look at it is that the smaller the block the better able is mutation to find the block by chance (and the less essential is the combination of blocks)

Conclusions 2 • The benefit of sexual recombination in natural genomes may derive simply from their most basic genetic architecture: the fact that genomes contain thousands of functionally and physically particulate genes, each composed of thousands of strongly epistatic nucleotides.

Motives from Evolutionary Biology • In population genetics, sex defines the units of selection. and forces selection to act on these ‘particulate’ units… specifically, genes (not organisms, not genotypes, hence ‘selfish gene’) • Thus, evolution = movement in allele freq space • In Evolutionary Biology view (both Wright and Fisher) • Alleles of different genes are essentially unlinked (freely recombining=uniform xover) • If they’re tightly linked then they behave as if a single allele • There is never any utility in discussing selection on combinations of alleles – either a pair of alleles recombine or they’re one allele • Is that view correct?

Contrast with EC • Some kinds of EC propose sex as a means to manipulate ‘building-blocks’. – to select on ‘composite’ things. • In a building-block function, we suppose that • Selection on bits not sufficient • Selection on blocks is necessary • Ideally(?), precluding the utility of selection on bits, as in fully-deceptive trap functions, should secure the requirement and maximise advantage of selecting on blocks • But that just means that the blocks are really macro ‘atoms’ – like an allele with 2K alleles

No use for block concept? • If selection acts only on blocks – then they’re not really blocks (wrt selection) • If selection acts on bits, then it cant act on blocks as well • Maybe there’s no use for block concept? • Is there ever any circumstance where two-scales of selection are required to understand what’s going on? – forcing us to treat one of these scales as a combination of units from the lower scale, rather than an macro unit.

Interaction of recombination and mutation • Everybody knows that (even if you believe that recombination does something clever and important with blocks) mutation is required to provide the variation for recombination to act on. Is that all it does? • Note that recombination and point mutation in biological systems operate on units at very different scales: • Spontaneous point mutation facilitates movements in nucleotide sequence space • Recombination manipulates alleles, each containing thousands of nucleotides; hence facilitates movement in allele sequence space • What if • mutation doesn’t merely provide variation for recombination to act on, it provides ability to follow selective gradients in nucleotide sequence space /= selective gradients in allele frequency space • Interaction of mutation and recombination → two scales of optimisation

block 1 block 2 ----------- block 1 block 2 ----------- T k Simple analysis: with recombination block 1 block 2 ----------- k block 1 block 2

For pop gen • Pop gen of sex view depends on alleles being mutational neighbours – here neighbours in crossover variation are a subset of mutational neighbours • No neighbours in allele seq space that are not neighbours in nucleotide seq space • But natural alleles differ at many nucleotide sites

notes • It means we can use a simplified linkage model: inter-block crossover points only (no partial linkage nec.)

So, can selection operate on both bits and blocks?

Either genes are tightly linked and behave as if a single allele, or they’re unlinked • Blocks are only meaningful to selection if they’re heritable – and bits are only meaningful to selection if they’re heritable (individually). • If sex defines the unit of selection, does that mean the internals of building-blocks are irrelevant? • In order for a composite to be a meaningful composite, its parts have to be significant. • However, recombination and spontaneous point mutation operate at quite different scales. • Is variation at these two different scales (together) the key to unifying EC BB view of sex with EB view that sex defines the units of selection?

Are natural genes ‘atomic’ or ‘composite’ things? • Are ‘genes’ in a GA ‘genes’ or ‘nucleotides’? • If sex defines the unit of selection, are the internals of the gene irrelevant? • Are building-blocks in a GA ‘genes’ or ‘groups of genes’? • For a block to be group, it must have meaningful component parts.

Overview • (Supposed) Preconditions for utility of selecting on blocks • Selection on bits not sufficient and not necessary • Selection on blocks is sufficient and necessary • An old favourite (that does not require BB combination!) • Concatenated trap functions (fully-deceptive, with small k) • Preserving common alleles + macro-mutation (exp in k) • Uniform vs two-point • Concatenated trap functions with large k • Selection on bits not sufficient but is necessary • Selection on blocks is not sufficient but is necessary • Local optimisation at both scales is required • A slightly less contrived (more believable) example

Richard A. Watson Natural Systems group ECS, Southampton University, UK