Evolving Word-Aligners via Genetic Programming

Evolving Word-Aligners via Genetic Programming Ben Heilers CS 294-5 December 10, 2004

Goal • To evolve word aligners through genetic programming. • In other words, instead of building a word aligner, building a system which may possibly find a better word aligner than I myself could.

GP vs. GA • Genetic Algorithms are algorithms which apply natural reproductive operations to search. • The search is across the function space instead of the solution space. • Genetic Programming is newer form of GA, where the members of the “population” are actual programs and not just mathematical functions.

[ chart source: Sette S, Boullart L. “Genetic programming: principles and applications.” Engineering Applications of Artificial Intelligence, Vol. 14, Dec. 2001, pg 728 ]

Problems with Genetic Programming One danger of Genetic Algorithms in general is that once the fitness reaches a plateau, the population tends to degrade. I hope to have overcome this by saving the best individual of each generation. [source: Schmiedle F, Drechsler N, Grosse D, Drechsler R. “Heuristic learning based on genetic programming.” Genetic Programming & Evolvable Machines, Vol. 3, Dec. 2002, pg 376 ]

Problems with Genetic Programming (cont.) • Random initial population: Most literature suggests starting with a completely random initial population. However, I hoped to boost the process by starting with a few varieties of the baseline Word Aligner from assignment 4, and then sending these through a few generations with increased mutation rate.

Problems with Genetic Programming (cont.) • Bloat: a phenomena by which files tend to grow in size. I hope to avoid this by maintaining a maximum number of nodes if it becomes a problem (but so far it has not).

Abstract Syntax Trees • Common practice in GP is to represent programs as trees instead of bit strings. • Eclipse IDE has a complete toolkit already set up for representing java classes as ASTs. (org.eclipse.jdt.core.dom) • Can then use Visitor Design Pattern for ease of traversing ASTs.

ASTHierarchy [source: http://www-scf.usc.edu/~csci410/espresso/espressoAST.pdf ]

Steady State vs. Generational Model • Steady State models our own life cycles: where parents and offspring co-exist. • Generational Model is more akin to seasonal crops, fish and insects: where parents die after reproducing and before children are born. • I chose the Generational Model,

Selection Process • Tournament Selection • Proportionate Fitness Selection • Most of computing resources in GA tend to go towards selecting the next generation, so I chose Proportionate Fitness model as it is quicker.

Selection: Fitness Proportionate Members of next generation are chosen at random, with chance of being selected proportionate to their fitness.

Choosing a Fitness Function • Heuristic function used to rank members of population. • From assignment 4, we have definitions of Precision, Recall, and AER • Precision = % of alignments proposed by candidate which reference solution considers as possible • Recall = % of alignments reference solution considers sure that candidate guesses correctly • AER ≈ 1 – avg(Precision, Recall)

Choosing a Fitness Function (cont.) • Since the goal is to maximize Precision and Recall, and minimize AER, I chose a simple Fitness evaluation of: • 10 * [ w1 * P + w2 * R + w3 * (1 – AER) ] • Currently using w1 = w2 = w3 = 1. • Thus fitnesses may range from 0 to 30.

Mutation • Usually low percentages, I chose 0.05 based on textbooks and papers. Widely acknowledged that to-date choosing the right value is still a process of trial and error.

Problems with Mutation • Mutation involves selecting a part of AST and recreating a new node at this location. If the population was bit strings, this might be easy. As is, we must be careful not to create erroneous programs. • I am still working on this. Hope to over-come it by techniques in following pages, and by initially having a population many times larger than in the end.

Examples of Current Mutations • Casting Issues: H5 = 1073225951 << 1370307115; • Casting Issues: D3 = S1 = “ “; • New AST nodes must have values initialized: MISSING = 805857798 ^ -1499906716; • Initializers in for loops: for ( S3 = “ “; S3 < 1580192439; S3++ ) { B3 = false; }

Mutation (cont.) • I chose to guide process of mutation in a few ways: • Limit variable names to a few pre-initialized ints, doubles, lists. • Only do mutation and crossover within the alignSentencePair method which is called by the Fitness Function.

Mutation (cont.) • Set up protected methods to disregard exceptions and handle casting: double getDouble(List L, int i) { if (L== null || L.size() == 0) return 0.0; if (i >= L.size()) i = L.size()-1; if (i < 0) i = 0; return ((Double) L.get(i)).doubleValue(); }

Crossover • Two forms: uniform and n-point. • In uniform, roll dice for chance of crossover at every point, whereas with n-point crossover limit to n. • I chose 3-point crossover, choosing the 3 points by comparing ASTs for legal switching spots (i.e. leaving both methods with return statements, and not crossing over halfway through a for loop of one method).

Format of a GP run • Taking baseline word aligners, run through a few generations with high mutation rate, on a much larger population than later desired. • Run through this “random” population and filter out programs with errors. • Repeat N times: Compile files and run a single generation. • Test best of each generation on full set of sentence pairs.

Results/Challenges • Still having trouble with mutation. • Most mutation results in errors even now, thus gaining fitness scores of zero and being removed from the population. Thus the evolving process rarely produces significant results. • Thus no true evolution occurs from generation to generation. Any mutations occurring tend to be irrelevant.

Results/Challenges (cont.) • Training sentences need to be few in number since running a large population of word aligners. • I have prevented over-fitting by choosing the training sentences randomly from the test sentence pairs each generation. • However, this still does not help in that the fitness values produced on the small set of training sentences (about 10) do not necessarily correlate to the fitness values on the full set of testing data (about 450).

Fitness Values per Generation Fitness values on training data Fitness values on testing data Gen. Precision Recall AER Fitness 0 0.3658 0.2258 0.6864 9.0520 1 0.3658 0.2258 0.6864 9.0520 2 0.3535 0.2909 0.6678 9.7660 3 0.3658 0.2258 0.6864 9.0520 4 0.3658 0.2258 0.6864 9.0520 5 0.3658 0.2258 0.6864 9.0520 6 0.3935 0.1889 0.6966 8.8580 7 0.3658 0.2258 0.6864 9.0520 8 0.3658 0.2258 0.6864 9.0520 9 0.3658 0.2258 0.6864 9.0520 10 0.3658 0.2250 0.6864 9.0520 11 0.3658 0.2250 0.6864 9.0520 12 0.3658 0.2250 0.6864 9.0520 13 0.3658 0.2250 0.6864 9.0520

Results Baseline Results • Precision: 0.3659 • Recall: 0.2259 • AER: 0.6865 • Fitness: 9.0526 • On 447 Sentence Pairs used in assignment 4: • using Fitness of 10*[ P + R + (1-AER) ] Best Results So Far • Precision: 0.3535 • Recall: 0.2909 • AER: 0.6678 • Fitness: 9.7660

Of the starting designs, the one (left) has highest fitness, at 9.0520. It aligns French position I3 to the English position I4 which is closest to it. It maximizes 50 / [ 1 + (I3 – I4) ] Precision: 0.3658 Recall: 0.2258 AER: 0.6864 public Alignment alignSentencePair(SentencePair sentencePair) { alignment = new Alignment(); I1 = numEnglishWordsInSentence(sentencePair); I2 = numFrenchWordsInSentence(sentencePair); for (I3 = 0; I3 < I1; I3++) { I4 = -1; D1 = 0; for (I5 = 0; I5 < I2; I5++) { D2 = 50 / (1+abs(I3 - I5)); if (D2 >= D1) { D1 = D2; I4 = I5; } } addAlignment(alignment, I4, I3, true); } return alignment; } public Alignment alignSentencePair(SentencePair sentencePair){ alignment = new Alignment(); H5 = H5; I1 = numEnglishWordsInSentence(sentencePair); I1 = numFrenchWordsInSentence(sentencePair); I2 = numEnglishWordsInSentence(sentencePair); for (I3 = 0; I3 < I1; I3++) { I4 = -1; D1 = 0; for (I5 = 0; I5 < I2; I5++) { D2 = 50 / (1 + abs(I3 - I5)); if (D2 >= D1) { D1 = D2; I4 = I5; } } addAlignment(alignment,I4,I3,true); } return alignment; } The change to the right, where both for loops iterate over the length of the English sentence, managed to get a slightly higher fitness value, at 9.7660. Precision: 0.3535 Recall: 0.2909 AER: 0.6678

Next / TODO • Fix mutation: produce more usable, more phenotypically diverse java classes

Evolving Word-Aligners via Genetic Programming