Competent Program Evolution: MOSES Defense Synopsis

Competent Program Evolution Dissertation Defense Moshe Looks December 11th, 2006

Synopsis • Competent optimization requires adaptive decomposition • This is problematic in program spaces • Thesis: we can do it by exploiting semantics • Results: it works!

General Optimization • Find a solution s in S • Maximize/minimize f(s) • f:S • To solve this faster than O(|S|), make assumptions about f

Near-Decomposability Complete separability would be nice… Near-decomposability (Simon, 1969) is more realistic Weaker Interactions Stronger Interactions

Exploiting Separability • Separability = independence assumptions • Given a prior over the solution space • represented as a probability vector • Sample solutions from the model • Update model toward higher-scoring points • Iterate... • Works well when interactions are weak

Exploiting Near-Decomposability • Bayesian optimization algorithm (BOA) • represent problem decomposition as a Bayesian Network • learned greedily, via a network scoring metric • Hierarchical BOA • uses Bayesian networks with local structure • allows smaller model-building steps • leads to more accurate models • restricted tournament replacement • promotes diversity • Solves the linkage problem • Competence: solving hard problems quickly, accurately, and reliably

Program Learning • Solutions encode executable programs • execution maps programs to behaviors • exec:PB • find a program p in P • maximize/minimize f(exec(p)) • f:B • To be useful, make assumptions about exec, P, and B

Properties of Program Spaces • Open-endedness • Over-representation • many programs map to the same behavior • Compositional hierarchy • intrinsically organized into subprograms • Chaotic Execution • similar programs may have very different behaviors

Properties of Program Spaces • Simplicity prior • simpler programs are more likely • Simplicity preference • smaller programs are preferable • Behavioral decomposability • f:B is separable / nearly decomposable • White box execution • execution function is known and constant

Thesis • Program spaces not directly decomposable • Leverage properties of program spaces as inductive bias • Leading to competent program evolution

Representation-Building • Organize programs in terms of commonalities • Ignore semantically meaningless variation • Explore plausible variations

Representation-Building • Common regions must be aligned • Redundancy must be identified • Create knobs for plausible variations

Representation-Building • What about… • changing the phase? • averaging two input instead of picking one? • … behavior (semantic) space program (syntactic) space

Statics & Dynamics • Representations span a limited subspace of programs • Conceptual steps in representation-building: • reduction to normal form (x, x + 0 → x) • neighborhood enumeration (generate knobs) • neighborhood reduction (get rid of some knobs) • Create demes to maintain a sample of many representations • deme: a sample of programs living in a common representation • intra-deme optimization: use the hBOA • inter-deme: • based on dominance relationships

Meta-Optimizing Semantic Evolutionary Search (MOSES) • Create an initial deme based on a small set of knobs (i.e., empty program) and random sampling in knob-space • Select a deme and run hBOA on it • Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes) • For each such program: • create a new representation centered around the program • create a new random sample within this representation • add as a deme • Repeat from step 2

Artificial Ant →### # # ### # # # # # # #### ##### ## # # # # # # # # # # # # # # # # # ### # # # # # # # # # # # # # # # # # ## ##### # # # # # # ####### # # # #### • Eat all food pellets within 600 steps • Existing evolutionary methods not significantly than random • Space contains many regularities • To apply MOSES: • three reductions rules for normal form • e.g., left, left, left → right • separate knobs for rotation,movement, & conditionals • no neighborhood reduction needed

Artificial Ant • How does MOSES do it? • Searches a greatly reduced space • Exploits key dependencies: • “[t]hese symmetries lead to essentially the same solutions appearing to be the opposite of each other. E.g. either a pair of Right or pair of Left terminals at a particular location may be important.” – Langdon & Poli, “Why ants are hard” • hBOA modeling learns linkage between rotation knobs • Eliminate modeling and the problem still gets solved • but with much higher variance • computational effort rises to 36,000

Elegant Normal Form (Holman, ’90) • Hierarchical normal form for Boolean formulae • Reduction process takes time linear in formula size • 99% of random 500-literal formulae reduced over 98%

Syntactic vs. Behavioral Distance • Is there a correlation between syntactic and behavioral distance? • 5000 unique random formulae of arity 10 with 30 literals each • qualitatively similar results for arity 5 • Computed the set of pairwise • behavioral distances (truth-table Hamming distance) • syntactic distances (tree edit distance, normalized by tree size) • The same computation on the same formulae reduced to ENF

Syntactic vs. Behavioral Distance • Is there a correlation between syntactic and behavioral distance? Random Formulae Reduced to ENF

Neighborhoods & Knobs • What do neighborhoods look like, behaviorally? • 1000 unique random formulae, arity 5, 100 literals each • qualitatively similar results for arity 10 • Enumerate all neighbors (edit distances <2) • compute behavioral distance from source • Neighborhoods in MOSES defined based on ENF • neighbors are converted to ENF, compared to original • used to heuristically reduce total neighborhood size

Neighborhoods & Knobs • What do neighborhoods look like, behaviorally? Random formulae Reduced to ENF

Hierarchical Parity-Multiplexer • Study decomposition in a Boolean domain • Multiplexer function of arity k1 computed from k1 parity function of arity k2 • total arity is k1k2 • Hypothesis: • parity subfunctions will exhibit tighter linkages

Hierarchical Parity-Multiplexer • Computational effort decreases 42% with model-building (on 2-parity-3-multiplexer) • Paritysubfunctions(adjacent pairs)have tightest linkages • Hypothesis validated

Program Growth • 5-parity, minimal program size ~ 53

Program Growth • 11-multiplexer, minimal program size ~ 27

Where do the Cycles Go? N is population size, O(n1.05) l is program size, a is the arity of the space n is representation size, O(a·program size) c is number of test cases

Supervised Classification • Goals: • accuracies comparable to SVM • superior accuracy vs. GP • simpler classifiers vs. SVM and GP

Supervised Classification • How much simpler? • Consider average-sized formulae learned for the 6-multiplexer • MOSES • 21 nodes • max depth 4 • GP (after reduction to ENF!) • 50 nodes • max depth 7 and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3))))) or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

Supervised Classification • Datasets taken from recent comp. bio. papers • Chronic fatigue syndrome (101 cases) • based on 26 SNPs • genes either in homozygosis, in heterozygosis, or not expressed • 56 binary features • Lymphoma (77 cases) & aging brains (19 cases) • based on gene expression levels (continuous) • 50 most-differentiating genes selected • preprocessed into binary features based on medians • All experiments based on 10 independent runs of 10-fold cross-validation

Quantitative Results • Classification average test accuracy:

Quantitative Results • Benchmark performance: • artificial ant • 6x less computational effort vs. EP, 20x less vs. GP • parity problems • 1.33x less vs. EP, 4x less vs. GP on 5-parity • found solutions to 6-parity (none found by EP or GP) • multiplexer problems • 9x less vs. GP on 11-multiplexer

Qualitative Results • Requirements for competent program evolution • all requirements for competent optimization • + exploit semantics • + recombine programs only within bounded subspaces • Bipartite conception of problem difficulty • program-level: adapted from the optimization case • deme-level: theory based on global properties of the space (deme-level neutrality, deceptiveness, etc.)

Qualitative Results • Representation-building for programs: • parameterization based on semantics • transforms program space properties • to facilitate program evolution • probabilistic modeling over sets of program transformations • models compactly represent problem structure

Competent Program Evolution • Competent: not just good performance • explainability of good results • robustness • Vision: representations are important • program learning is unique • representations must be specialized • based on semantics • MOSES: meta-optimizing semantic evolutionary search • exploiting semantics and managing demes

Committee • Dr. Ron Loui (WashU, chair) • Dr. Guy Genin (WashU) • Dr. Ben Goertzel (Virginia Tech, Novamente LLC) • Dr. David E. Goldberg (UIUC) • Dr. John Lockwood (WashU) • Dr. Martin Pelikan (UMSL) • Dr. Robert Pless (WashU) • Dr. William Smart (WashU)

Competent Program Evolution: MOSES Defense Synopsis

Competent Program Evolution: MOSES Defense Synopsis

Presentation Transcript

Georgia Competent Applicator of Pesticides Program GCAPP

Competent Person

Market Evolution Program

Market Evolution Program

Competent educator

HHW Program Evolution

Competent Healthcare

Market Evolution Program

Market Evolution Program Update

Competent GAs

Market Evolution Program

Georgia Competent Applicator of Pesticides Program (GCAPP)

Market Evolution Program

Market Evolution Program

Competent Jerk

Market Evolution Program

COMPETENT COMMUNICATOR

Market Evolution Program