350 likes | 480 Views
This chapter delves into top-down parsing techniques, which aim to derive an input string from the starting symbol of a grammar, creating a parse tree in a preorder manner. It discusses various approaches including backtracking and non-backtracking (recursive descent) parsing, outlines the issues faced such as left recursion and the need for left factoring. Additionally, it describes algorithms for eliminating left recursion and building parsing tables, essential for predictive parsing, and highlights the significance of First and Follow sets in syntax analysis.
E N D
Objectives of Top-Down Parsing • an attempt to find a leftmost derivation for an input string. • an attempt to construct a parse tree for the input string starting from the root and creating the nodes of the parse tree in preorder.
Input String : lm lm lm > > >
Approaches of Top-Down Parsing • 1. with backtracking (making repeated scans of the input, a general form of top-down parsing) • Methods: To create a procedure for each nonterminal.
L = { cabd, cad } e.g.S -> cAd A -> ab | a S( ) { if input symbol == ‘c’ A( ) { isave= input-pointer; { Advance(); if input-symbol == ‘a’ if A() { Advance(); if input-symbol == ‘d’ if input-symbol == ‘b’ { Advance(); { Advance(); return true; return true; } } } } return false; input-pointer = isave; } if input-symbol == ‘a’ { Advance(); return true; } else return false; } c a d
Problems for top-down parsing with backtracking : (1) left-recursion (can cause a top-down parser to go into an infinite loop) Def. A grammar is said to be left-recursive if it has a nonterminal A s.t. there is a derivation A => A for some . (2) backtracking - undo not only the movement but also the semantics entering in symbol table. (3) the order the alternatives are tried (For the grammar shown above, try w = cabd where A -> a is applied first) +
Elimination of Left-Recursion With immediate left recursion: A -> A | ==> transform into A -> A' A' -> A' | A A A' A A' ===> A . . A' . A . A' A …
e.g. E -> E + T | T T -> T * F | F F -> (E) | id After transformation: E -> TE' E' -> +TE' | T -> FT' T' -> *FT' | F -> (E) | id
General form (with left recursion): A -> A 1 | A 2 | ... | A n | 1 | 2 | ... | m After transformation: ==> A -> 1 A' | 2 A' | ... | m A' A' -> 1 A' | 2 A' | ... | n A' |
How about left recursion occurred for derivation with more than two steps? e.g., S -> Aa | b A -> Ac | Sd | e where S => Aa => Sda
Algorithm: Eliminating left recursion + Input Context-free Grammar G with no cycles (i.e., A => A ) or -production Methods: 1. Arrange the nonterminals in some order A1, A2, ... , An 2. for i = 1 to n do { for j = 1 to i -1 do replace each production of the form Ai -> Aj by the production Ai -> 1 | 2 | ... | k , where Aj -> 1 | 2 | ... | k are all current Aj-production; eliminate the immediate left-recursion among the Ai- production; }
An Example e.g. S -> Aa | b A -> Ac | Sd | e Step 1: ==> S -> Aa | b Step 2: ==> A -> Ac | Aad | bd | e Step 3: ==> A -> bdA' |eA' A' -> cA' |adA' |
2. Non-backtracking (recursive-descent) parsing recursive descent : use a collection of mutually recursive routines to perform the syntax analysis. Left Factoring : A -> 1 | 2 ==> A -> A' A' -> 1 | 2 Methods: • For each nonterminal A find the longest prefix common to two or more of its alternatives. If replace all the A productions A -> 1 | 2 | ... | n | others by A -> A‘ | others A' -> 1 | 2 | ... | n 2. Repeat the transformation until no more found e.g. S -> iCtS | iCtSeS | a C -> b ==> S -> iCtSS' | a S' -> eS | C -> b
Predicative Parsing Features: - maintains a stack rather than recursive calls - table-driven Components: 1. An input buffer with end marker ($) 2. A stack with endmarker ($) on the bottom 3. A parsing table, a two-dimensional array M[A,a], where ‘A’ is a nonterminal symbol and ‘a’ is the current input symbol (terminal/token).
Parsing Table ( ) $ M[A,a] S ( S ) S S ε S ε S
Algorithm: Input: An input string w and a parsing table M for grammar G. Output: A leftmost derivation of w or an error indication.
Starting Symbol of the grammar Initially w$ is in input buffer and S$ is in the stack. Method: do { Let a of w be the next input symbol and X be the top stack symbol; if X is a terminal { if X == a then pop X from stack and remove a from input; else ERROR();} else { if M[X, a] = X -> Y1Y2...Yn then 1. pop X from the stack; 2. push YnYn-1...Y1 onto the stack with Y1 on top; else ERROR(); } } while (X ≠ $) if (X == $) and (the next input symbol == $) then accept else error();
Construction of the parsing table for predictive parser First and Follow Def. First() /* denotes grammar symbol*/ is the set of terminals that begin the string derived from . If => , then is also in First(). Def. Follow(A), A is a nonterminal, is the set of terminals a that can appear immediately to the right of A in some sentential form, that is, the set of terminals 'a' s.t. there exists a derivation of the form S =>* A a for some and . If A can be the rightmost symbol in some sentential form, then is in Follow(A). *
Compute First(X) for all grammar symbols X: 1. If X is terminal, then First(X) = {X}. 2. If X -> is a production then is in First(X). 3. If X is nonterminal and X -> Y1Y2...Yk is a production, then place 'a' in First(X) if for some i, a is in First(Yi), and is in all of First(Y1), ... , First(Yi-1); that is Y1 ... Yi-1 => . If is in First(Yj) for all j = 1,2,...,k, then add in First(X). *
An Example E -> TE' E' -> +TE'| T -> FT' T' -> *FT‘ | F -> (E) | id First(E) = First(T) = First(F) = {(, id} First(E') = {+, } First(T') = {*, }
Compute Follow(A) for all nonterminals A 1. Place $ in Follow(S), where S is the start symbol and $ is the input buffer endmarker. 2. If there is a production A -> B , then everything in First() except for is placed in Follow(B). 3. If there is a production A -> B, or a production A -> B where First() contains , then everything in Follow(A) is in Follow(B).
An Example E -> TE' E' -> +TE'| T -> FT' T' -> *FT' | F -> (E) | id /* E is the start symbol */ Follow(E) = { $,) } // rules 1 & 2 Follow(E') = { $,) } // rule 3 Follow(T) = { +,$,) } // rules 2 & 3 Follow(T') = { +,$,) } // rule 3 Follow(F) = { *,+,$,) } // rules 2 & 3
E -> TE' E' -> +TE'| T -> FT' T' -> *FT‘ | F -> (E) | id First(E) = First(T) = First(F) = {(, id} First(E') = {+, } First(T') = {*, }
Construct a Predicative Parsing Table 1. For each production A -> of the grammar, do steps 2 and 3. 2. For each terminal a in First(), add A -> to M[A, a]. 3. If is in First(), add A -> to M[A, b] for each terminal b in Follow(A). If is in First() and $ is in Follow(A), add A -> to M[A, $]. 4. Make each undefined entry of M be error.
LL(1) grammar A grammar whose parsing table has no multiply-defined entries is said to be LL(1). First 'L' : scan the input from left to right. Second 'L': produce a leftmost derivation. '1' : use one input symbol to determine parsing action. * No ambiguous or left-recursive grammar can be LL(1).
Properties of LL(1) grammar A grammar G is LL(1) iff whenever A -> | are two distinct productions of G, the following conditions hold: (1) For no terminal a do both and derive strings beginning with a. (based on method 2) First() ∩ First() = ψ (2) At most one of and can derive the empty string (based on method 3). (3) if => then does not derive any string beginning with a terminal in Follow (A) (based on methods 2 and 3). First() ∩ Follow(A) = ψ (i.e. If First(A) contains then First(A) ∩ Follow(A) = ψ) *
Def. for Multiply-defined entry If G is left-recursive or ambiguous, then M will have at least one multiply-defined entry. e.g. S -> iCtSS'| a S' -> eS | C -> b generates: M[S',e] = { S' -> , S' -> eS} with multiply- defined entry.
Difficulty in predictive parsing • Left recursion elimination and left factoring make the resulting grammar hard to read and difficult to use for translation purpose. Thus: * Use predictive parser for control constructs * Use operator precedence for expressions.
Assignment #3b Do exercises 4.3, 4.10, 4.13, 4.15