News blurb o’ the day. Allied armed forces in Iraq using machine translation+AIM to communicate Many possible MT techniques; some based on Bayesian statistical techniques Ex: see “le chat noire” <-> “the black cat”; estimate Pr[“black cat”|“chat noire”]

News blurb o' the day

• Allied armed forces in Iraq using machine translation+AIM to communicate

• Many possible MT techniques; some based on Bayesian statistical techniques

• Ex: see “le chat noire” <-> “the black cat”; estimate Pr[“black cat”|“chat noire”]

• When you see “chat” next, estimate max probability word to associate with it

• Much more difficult than your spam filters -- need to handle entire phrases, words out of order, idom, etc.

## Recursive Descent Parsing

Or: Before you can understand this sentence, first, you must understand this sentence...

• A translation between streams of tokens and complex structures like trees (or tree-like data structs)

• One step beyond lexing

• Requires more sophisticated structures

### Lexical analysis, revisited

• Rules equivalent to regular expressions

• Can only represent sequences, indefinite repetition (i.e., “*” or “+” operators), and finite cases (“[]” and “|” operators)

• Can be recognized in linear time

• Equivalent to a finite state machine

### R.D. Parsing and CFGs

• Rules can be recursive

• Technically, based on “context free grammars”

• Needs a full stack machine, not just a state machine

• Stack can be unboundedly deep

• Needs more than a finite number of states to run

### CFGs and BNF

• Write our rules in “Bakus-Naur Normal Form” (BNF)

• Rules made up of two elements:

• Terminals: actual tokens that could be found in the data -- “dog”, “127”, “{“, [a-zA-Z]+

• Non-terminals: names of rules

• Rules must be of form:

• LHS := term1 op1 term2 op2 ... termN opN

• LHS is a non-terminal

• termi is a terminal or non-terminal

• opi is one of the operators we’ve met before -- +, *, |, ()

### BNF from P2

FILE := ( CONTROL | PUZZLEDEF )*

CONTROL := ( OUTFILE |

LOGFILE |

ERRFILE |

RESULTS |

STATS |

SEARCH-CTRL |

"Run" |

"Reset" )

### Recursion...

N2KPUZZLE := "NToTheKPuzzle" "(" HNAME ")”

"=” "{”

"StartState" "=" NKPUZSTATE

"GoalState" "=" NKPUZSTATE

"}”

NKPUZSTATE := "[”

( NUMLIST |

NKPUZSTATE ( "," NKPUZSTATE )* )

"]”

NUMLIST := NON-NEG-INTEGER ( "," NON-NEG-INTEGER )*

HNAME := [a-zA-Z]+

POS-INTEGER := [1-9][0-9]+

NON-NEG-INTEGER := [0-9]+

### Turning it into code

public PuzState parseNKPuzzle(Lexer l) {

Token t=l.next();

if (!t.tokStr().equals(“NToTheKPuzzle”)) {

throw new ParseException(“Unexpected” +

“ token “ + t.tokStr() +

“ found when expecting “ +

“ N^k-1 puzzle state”);

}

t=l.next();

if (!t.tokStr().equals(“(“)) { //... }

t=l.next();

if (t.getType()!=TT_HNAME) { // ... }

String heuristic=t.tokStr();

### Turning it into code

// parse “)”, “=“, “{“, “StartState”,

// “=“. Now ready for NKPUZSTATE

NkPuzStateRep sRep=parseNKPuzState(l);

// now parse “GoalState”, “=“

NkPuzStateRep gRep=parseNKPuzState(l);

// parse “}” and you know you’re done with

// NKPUZ

// now construct the actual puzzle object

if (heuristic.equals(“Manhattan”) {

NkPuz p=new NkManhattanPuz(sRep,gRep);

return p;

}

if (heuristic.equals(“TileCount”) {

NkPuz p=new NkTileCountPuz(sRep,gRep);

return p;

}