1 / 36

Competent Program Evolution

Competent Program Evolution. Dissertation Defense Moshe Looks December 11 th , 2006. Synopsis. Competent optimization requires adaptive decomposition This is problematic in program spaces Thesis: we can do it by exploiting semantics Results: it works!. General Optimization.

jendayi
Download Presentation

Competent Program Evolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Competent Program Evolution Dissertation Defense Moshe Looks December 11th, 2006

  2. Synopsis • Competent optimization requires adaptive decomposition • This is problematic in program spaces • Thesis: we can do it by exploiting semantics • Results: it works!

  3. General Optimization • Find a solution s in S • Maximize/minimize f(s) • f:S • To solve this faster than O(|S|), make assumptions about f

  4. Near-Decomposability Complete separability would be nice… Near-decomposability (Simon, 1969) is more realistic Weaker Interactions Stronger Interactions

  5. Exploiting Separability • Separability = independence assumptions • Given a prior over the solution space • represented as a probability vector • Sample solutions from the model • Update model toward higher-scoring points • Iterate... • Works well when interactions are weak

  6. Exploiting Near-Decomposability • Bayesian optimization algorithm (BOA) • represent problem decomposition as a Bayesian Network • learned greedily, via a network scoring metric • Hierarchical BOA • uses Bayesian networks with local structure • allows smaller model-building steps • leads to more accurate models • restricted tournament replacement • promotes diversity • Solves the linkage problem • Competence: solving hard problems quickly, accurately, and reliably

  7. Program Learning • Solutions encode executable programs • execution maps programs to behaviors • exec:PB • find a program p in P • maximize/minimize f(exec(p)) • f:B • To be useful, make assumptions about exec, P, and B

  8. Properties of Program Spaces • Open-endedness • Over-representation • many programs map to the same behavior • Compositional hierarchy • intrinsically organized into subprograms • Chaotic Execution • similar programs may have very different behaviors

  9. Properties of Program Spaces • Simplicity prior • simpler programs are more likely • Simplicity preference • smaller programs are preferable • Behavioral decomposability • f:B is separable / nearly decomposable • White box execution • execution function is known and constant

  10. Thesis • Program spaces not directly decomposable • Leverage properties of program spaces as inductive bias • Leading to competent program evolution

  11. Representation-Building • Organize programs in terms of commonalities • Ignore semantically meaningless variation • Explore plausible variations

  12. Representation-Building • Common regions must be aligned • Redundancy must be identified • Create knobs for plausible variations

  13. Representation-Building • What about… • changing the phase? • averaging two input instead of picking one? • … behavior (semantic) space program (syntactic) space

  14. Statics & Dynamics • Representations span a limited subspace of programs • Conceptual steps in representation-building: • reduction to normal form (x, x + 0 → x) • neighborhood enumeration (generate knobs) • neighborhood reduction (get rid of some knobs) • Create demes to maintain a sample of many representations • deme: a sample of programs living in a common representation • intra-deme optimization: use the hBOA • inter-deme: • based on dominance relationships

  15. Meta-Optimizing Semantic Evolutionary Search (MOSES) • Create an initial deme based on a small set of knobs (i.e., empty program) and random sampling in knob-space • Select a deme and run hBOA on it • Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes) • For each such program: • create a new representation centered around the program • create a new random sample within this representation • add as a deme • Repeat from step 2

  16. Artificial Ant →### # # ### # # # # # # #### ##### ## # # # # # # # # # # # # # # # # # ### # # # # # # # # # # # # # # # # # ## ##### # # # # # # ####### # # # #### • Eat all food pellets within 600 steps • Existing evolutionary methods not significantly than random • Space contains many regularities • To apply MOSES: • three reductions rules for normal form • e.g., left, left, left → right • separate knobs for rotation,movement, & conditionals • no neighborhood reduction needed

  17. Artificial Ant • How does MOSES do it? • Searches a greatly reduced space • Exploits key dependencies: • “[t]hese symmetries lead to essentially the same solutions appearing to be the opposite of each other. E.g. either a pair of Right or pair of Left terminals at a particular location may be important.” – Langdon & Poli, “Why ants are hard” • hBOA modeling learns linkage between rotation knobs • Eliminate modeling and the problem still gets solved • but with much higher variance • computational effort rises to 36,000

  18. Elegant Normal Form (Holman, ’90) • Hierarchical normal form for Boolean formulae • Reduction process takes time linear in formula size • 99% of random 500-literal formulae reduced over 98%

  19. Syntactic vs. Behavioral Distance • Is there a correlation between syntactic and behavioral distance? • 5000 unique random formulae of arity 10 with 30 literals each • qualitatively similar results for arity 5 • Computed the set of pairwise • behavioral distances (truth-table Hamming distance) • syntactic distances (tree edit distance, normalized by tree size) • The same computation on the same formulae reduced to ENF

  20. Syntactic vs. Behavioral Distance • Is there a correlation between syntactic and behavioral distance? Random Formulae Reduced to ENF

  21. Neighborhoods & Knobs • What do neighborhoods look like, behaviorally? • 1000 unique random formulae, arity 5, 100 literals each • qualitatively similar results for arity 10 • Enumerate all neighbors (edit distances <2) • compute behavioral distance from source • Neighborhoods in MOSES defined based on ENF • neighbors are converted to ENF, compared to original • used to heuristically reduce total neighborhood size

  22. Neighborhoods & Knobs • What do neighborhoods look like, behaviorally? Random formulae Reduced to ENF

  23. Hierarchical Parity-Multiplexer • Study decomposition in a Boolean domain • Multiplexer function of arity k1 computed from k1 parity function of arity k2 • total arity is k1k2 • Hypothesis: • parity subfunctions will exhibit tighter linkages

  24. Hierarchical Parity-Multiplexer • Computational effort decreases 42% with model-building (on 2-parity-3-multiplexer) • Paritysubfunctions(adjacent pairs)have tightest linkages • Hypothesis validated

  25. Program Growth • 5-parity, minimal program size ~ 53

  26. Program Growth • 11-multiplexer, minimal program size ~ 27

  27. Where do the Cycles Go? N is population size, O(n1.05) l is program size, a is the arity of the space n is representation size, O(a·program size) c is number of test cases

  28. Supervised Classification • Goals: • accuracies comparable to SVM • superior accuracy vs. GP • simpler classifiers vs. SVM and GP

  29. Supervised Classification • How much simpler? • Consider average-sized formulae learned for the 6-multiplexer • MOSES • 21 nodes • max depth 4 • GP (after reduction to ENF!) • 50 nodes • max depth 7 and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3))))) or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

  30. Supervised Classification • Datasets taken from recent comp. bio. papers • Chronic fatigue syndrome (101 cases) • based on 26 SNPs • genes either in homozygosis, in heterozygosis, or not expressed • 56 binary features • Lymphoma (77 cases) & aging brains (19 cases) • based on gene expression levels (continuous) • 50 most-differentiating genes selected • preprocessed into binary features based on medians • All experiments based on 10 independent runs of 10-fold cross-validation

  31. Quantitative Results • Classification average test accuracy:

  32. Quantitative Results • Benchmark performance: • artificial ant • 6x less computational effort vs. EP, 20x less vs. GP • parity problems • 1.33x less vs. EP, 4x less vs. GP on 5-parity • found solutions to 6-parity (none found by EP or GP) • multiplexer problems • 9x less vs. GP on 11-multiplexer

  33. Qualitative Results • Requirements for competent program evolution • all requirements for competent optimization • + exploit semantics • + recombine programs only within bounded subspaces • Bipartite conception of problem difficulty • program-level: adapted from the optimization case • deme-level: theory based on global properties of the space (deme-level neutrality, deceptiveness, etc.)

  34. Qualitative Results • Representation-building for programs: • parameterization based on semantics • transforms program space properties • to facilitate program evolution • probabilistic modeling over sets of program transformations • models compactly represent problem structure

  35. Competent Program Evolution • Competent: not just good performance • explainability of good results • robustness • Vision: representations are important • program learning is unique • representations must be specialized • based on semantics • MOSES: meta-optimizing semantic evolutionary search • exploiting semantics and managing demes

  36. Committee • Dr. Ron Loui (WashU, chair) • Dr. Guy Genin (WashU) • Dr. Ben Goertzel (Virginia Tech, Novamente LLC) • Dr. David E. Goldberg (UIUC) • Dr. John Lockwood (WashU) • Dr. Martin Pelikan (UMSL) • Dr. Robert Pless (WashU) • Dr. William Smart (WashU)

More Related