Review: How do we define a grammar (what are the components in a grammar)?

Review: • How do we define a grammar (what are the components in a grammar)? • What is a context free grammar? • What is the language defined by a grammar? • What is an ambiguous grammar? • Why we care about left or right derivation?

Example: <PROGRAM> ->’program’ id ‘begin’ <stmt_list> ‘end’ <STMT_LIST> -> <STMT> ‘;’<STMT_LIST> | <STMT> <STMT> -> id ‘=‘ <EXPR> <EXPR> -><EXPR> <OP> <EXPR> | id <OP> -> ‘+’ | ‘-’ | ‘*’ | ‘/’ program test begin t0 = t1 + t2; t3 = t0 * t4 end program test begin t0 = t1+t2; t3 = t0*t4 end * <PROGRAM> ==>

Parsing: • The process to determine whether the start symbol can derive the program. • If successful, the program is a valid program. • If failed, the program is invalid. • Two approaches in general. • Expanding from the start symbol to the whole program (top down) • Reduction from the whole program to start symbol (bottom up).

Parsing methods: • universal: • There exists algorithms that can parse any context free grammar. These algorithms are too inefficient to be used anywhere. • What is considered efficient? Scan the program (from left to right) once. • Top-down parsing • build the parse tree from root to leave (using leftmost derivation, why?). • Recursive descent, and LL parser • Bottom-up parsing • build the parse tree from leaves to root. • Operator precedence parsing, LR (SLR, canonical LR, LALR).

Recursive descent parsing associates a procedure with each nonterminal in the grammar, it may require backtracking of the input string. • Example: <type>-><simple> | ^ id | array [<sample>] of <type> <simple> ->integer | char | num dotdot num void type() { if (lookahead == INTEGER || lookahead == CHAR || lookahead==NUM) simple(); else if (lookahead == ‘^’) { match (‘^’); match(ID); } else if (lookahead == ARRAY) { match (ARRAY); match(‘[‘); simple(); match (‘]’); match (OF); type(); } else error(); }

Example: <type>-><simple> | ^ id | array [<simple>] of <type> <simple> ->integer | char | num dotdot num void simple() { if (lookahead == INTEGER) match (INTEGER); else if (lookahead == CHAR) match (CHAR); else if (lookahead == NUM) { match(NUM); match(DOTDOT); match(NUM); } else error(); } void match(token t) { if (lookahead == t) {lookahead = nexttoken();} else error(); }

Recursive descent parsing may require backtracking of the input string • try out all productions, backtrack if necessary. • E.g S->cAd, A->ab | a • input string cad • A special case of recursive-descent parser that needs no backtracking is called a predictive parser. • Look at the input string, must predict the right production every time to avoid backtracking. • Needs to know what first symbols can be generated by the right side of a production only lookahead for one token)

First(a) - the set of tokens that can appear as the first symbols of one or more strings generated from a. If a is empty string or can generate empty string, then empty string is also in First(a). • Given productions A ->a | b, predictive (by looking at 1 token ahead) parsing requires First(a) and First(b) to be disjoint. • Predictive parsing won’t work on some type of grammars: • Left recursion: A->Aw (expanding A results in an infinite loop). • Have common left factor: A->aB | aC (First(aB) and First(aC) is not disjoint).

Algorithm 4.1. Eliminating left recursion: Arrange the nonterminals in some order A1, A2, …, An for i = 1 to n do begin for j = 1 to I-1 do begin expand production of the form Ai ->Aj w end for eliminate the immediate left recursion among Ai productions. End for (the algorithm can fail if the grammar has a cycle (A==> A), or A->e)

Left factoring (to produce a grammar suitable for predictive parsing) • replace productions by Example: S->iEtS | iEtSeS|a E->b

Review: How do we define a grammar (what are the components in a grammar)?