120 likes | 188 Views
Explore the use of machine translation techniques like Recursive Descent Parsing in aiding communication within Iraqi allied armed forces. Learn how Bayesian statistical techniques can be employed to enhance translation accuracy.
E N D
News blurb o’ the day • Allied armed forces in Iraq using machine translation+AIM to communicate • Many possible MT techniques; some based on Bayesian statistical techniques • Ex: see “le chat noire” <-> “the black cat”; estimate Pr[“black cat”|“chat noire”] • When you see “chat” next, estimate max probability word to associate with it • Much more difficult than your spam filters -- need to handle entire phrases, words out of order, idom, etc.
Recursive Descent Parsing Or: Before you can understand this sentence, first, you must understand this sentence...
Recursive Descent Parsing • A translation between streams of tokens and complex structures like trees (or tree-like data structs) • One step beyond lexing • Requires more sophisticated structures
Lexical analysis, revisited • Rules equivalent to regular expressions • Can only represent sequences, indefinite repetition (i.e., “*” or “+” operators), and finite cases (“[]” and “|” operators) • Can be recognized in linear time • Equivalent to a finite state machine
R.D. Parsing and CFGs • Rules can be recursive • Technically, based on “context free grammars” • Needs a full stack machine, not just a state machine • Stack can be unboundedly deep • Needs more than a finite number of states to run
CFGs and BNF • Write our rules in “Bakus-Naur Normal Form” (BNF) • Rules made up of two elements: • Terminals: actual tokens that could be found in the data -- “dog”, “127”, “{“, [a-zA-Z]+ • Non-terminals: names of rules • Rules must be of form: • LHS := term1 op1 term2 op2 ... termN opN • LHS is a non-terminal • termi is a terminal or non-terminal • opi is one of the operators we’ve met before -- +, *, |, ()
BNF from P2 FILE := ( CONTROL | PUZZLEDEF )* CONTROL := ( OUTFILE | LOGFILE | ERRFILE | RESULTS | STATS | SEARCH-CTRL | "Run" | "Reset" )
BNF from P2 FILE:= ( CONTROL | PUZZLEDEF)* CONTROL:= ( OUTFILE | LOGFILE | ERRFILE | RESULTS | STATS | SEARCH-CTRL | "Run" | "Reset" )
BNF from P2 FILE := ( CONTROL | PUZZLEDEF )* CONTROL := ( OUTFILE | LOGFILE | ERRFILE | RESULTS | STATS | SEARCH-CTRL | "Run" | "Reset" )
Recursion... N2KPUZZLE := "NToTheKPuzzle" "(" HNAME ")” "=” "{” "StartState" "=" NKPUZSTATE "GoalState" "=" NKPUZSTATE "}” NKPUZSTATE := "[” ( NUMLIST | NKPUZSTATE ( "," NKPUZSTATE )* ) "]” NUMLIST := NON-NEG-INTEGER ( "," NON-NEG-INTEGER )* HNAME := [a-zA-Z]+ POS-INTEGER := [1-9][0-9]+ NON-NEG-INTEGER := [0-9]+
Turning it into code public PuzState parseNKPuzzle(Lexer l) { Token t=l.next(); if (!t.tokStr().equals(“NToTheKPuzzle”)) { throw new ParseException(“Unexpected” + “ token “ + t.tokStr() + “ found when expecting “ + “ N^k-1 puzzle state”); } t=l.next(); if (!t.tokStr().equals(“(“)) { //... } t=l.next(); if (t.getType()!=TT_HNAME) { // ... } String heuristic=t.tokStr();
Turning it into code // parse “)”, “=“, “{“, “StartState”, // “=“. Now ready for NKPUZSTATE NkPuzStateRep sRep=parseNKPuzState(l); // now parse “GoalState”, “=“ NkPuzStateRep gRep=parseNKPuzState(l); // parse “}” and you know you’re done with // NKPUZ // now construct the actual puzzle object if (heuristic.equals(“Manhattan”) { NkPuz p=new NkManhattanPuz(sRep,gRep); return p; } if (heuristic.equals(“TileCount”) { NkPuz p=new NkTileCountPuz(sRep,gRep); return p; }