Download Presentation
## Chapter 4

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Chapter 4**Syntax Analysis Yu-Chen Kuo**4.1 The Role of The Parser**• A parser obtains a string of tokens from the lexical analyzer and verifies the string can be generated by the grammar for the source language. • We expect the parser to report any syntax errors in an intelligible fashion. It should also recover from commonly occurring errors so that it can continue processing the remainder of its input. Yu-Chen Kuo**4.1 The Role of The Parser**Yu-Chen Kuo**Three Types of Parsers**• CYK algorithm and Early’s algorithm: inefficient to use in production compilers • Top-down method • Bottom-up method Yu-Chen Kuo**Syntax Error Handling**• Lexical error: • misspelling an identifier, keyword, or operator • Syntactic error: • an arithmetic expression with unbalanced parentheses • Semantic error: • an operator applied to an incompatible operand • Logical error: • an infinitely recursive call Yu-Chen Kuo**Syntax Error Handling (Cont.)**• The error handler in a parser has simple-to-state goals: • It should report the presence of errors clearly and accurately • It should recover from each error quickly enough to be able to detect subsequence errors • It should not significantly slow down the processing of correct programs Yu-Chen Kuo**Error-Recovery Strategies**• Panic mode • Discard the input symbol until one of a designated set of synchronizing tokens is found • synchronizing token: ;end • Guarantee not to go into an infinite loop • Phrase level • Parser may perform local correction • replace a prefix of the remaining input by some allowed string; • replace , by ; • delete an extraneous ; or insert missing ; • May lead to an infinite loop if we always insert something on the input ahead the current input symbol Yu-Chen Kuo**Error-Recovery Strategies (cont.)**• Error production • Grammars to produce errors • Global correction • Given an incorrect input string x and grammar G, find a parse tree for a related string y, such that the number of insertions, deletions, and changes of tokens required to transform x into y is as small as possible • Too costly Yu-Chen Kuo**4.2 Context-Free Grammars**• stmtifexprthenstmtelsestmt • Terminals: tokens • if, then, else • Noterminals: set of strings • expr, stmt • Start symbol • stmt • Productions Yu-Chen Kuo**Example 4.2**expr expr op expr expr (expr) expr - expr expr id op + op - op * op / op • Terminals: id, +, -, *, / . • Noterminals: expr, op • Start symbol: expr Yu-Chen Kuo**Notational Conventions**• There symbols are terminals: • Lower-case letters: a, b, c • Operator symbols: +, - • Punctuation symbols: parentheses, comma • Digits: 0, 1, …,9 • Boldface strings: id, if • There symbols are nonterminal: • Upper-case letters: A, B, C • The letter S: start symbol • Lower-case italic names: expr, stmt Yu-Chen Kuo**Notational Conventions (cont.)**• Upper-case letters late in alphabet: X, Y, Z, represent grammar symbols (terminals or nonterminal) • Lower-case letters late in alphabet: u, v,…z, represent string of terminals • Lower-case Greek letters : , , , represent string of grammar symbols • A-productions (all productions): A 1| 2|…| k • Start symbol: the left side of the first production Yu-Chen Kuo**Example 4.3**E EAE | (E) | - E | id A + | - | * | / | By notational conventions − Nonterminals: E, A Terminals: remaining symbols Yu-Chen Kuo**Derivations**E E+E | E*E| (E) | - E | id • Ederives -E • E - E • The derivation of -(id) from E • E - E - (E) -(id) • A , if A • : one step derivation • : zero or more steps derivations • : one or more steps derivations Yu-Chen Kuo**Derivations (cont.)**• , for any string • If , , then • L(G) denotes the language generated by G. L(G) contains all terminal symbols. w L(G), ifSw. String w is call a sentence of G. • S, may contain nonterminals. We call is a sentential form of G. • E.g., -(id + id) is a sentence of the grammar, because E -(id + id) Yu-Chen Kuo**Leftmost & Rightmost Derivations**• Leftmost derivation ( ) • -(E+E) -(id+E) -(id+id) • Rightmost derivation ( ) • -(E+E) -(E+id) -(id+id) • S. We call is a left-sentential form of G. • S. We call is a cannonical-sentential form of G. Yu-Chen Kuo**Parse Tree and Derivations**Yu-Chen Kuo**Parse Tree and Derivations (cont.)**Yu-Chen Kuo**Ambiguity**• More than one parse tree for some sentences • More than leftmost derivation for some sentences • More than rightmost derivation for some sentences Yu-Chen Kuo**4.3 Regular Expression vs. Context-free grammar**• Every language that can be described by a regular expression can also be described by a context-free grammar • (a|b)*abb • A0 aA0 | bA0 | aA1 A1 bA2 A2 bA3 A3 • Every regular set is a context-free language Yu-Chen Kuo**Why use regular expression to define the lexical syntax of a**language ? • Why not use CFG for the lexical syntax • Lexical rules of a language are frequently quite simple. We do not need a powerful grammar. • Regular expression provide a more concise and easier to understand notation for tokens • An efficient lexical analysis can be constructed automatically from regular expressions • Separating the syntactic structure of a language into lexical and nonlexical parts Yu-Chen Kuo**Why use regular expression to define the lexical syntax of a**language ? • Regular expressions are most useful for describing structure of lexical constructs such as identifies, constants, keywords • Grammars are most useful for describing nested structure of lexical constructs such balanced parentheses, matching begin-end’s, corresponding if-then-else’s. • Nested structures can not be described by regular expressions. Yu-Chen Kuo**Verifying the Language Generated by a Grammar**• Proof that L(G) = L • Every string generated by G is in L • Every string in L can be generated by G • S (S)S | , generates all string of balanced ( ) • Every sentence derived from S is balanced by induction • S (S)S * (x)S * (x)y (n steps) • S * x (less than n setps and must be balanced) • S * y (less than n setps and must be balanced) • Every balanced string length 2n is derivable from S • w = (x)y of length 2n • x and y are length of less than 2n. They are both balanced and derivable from S • S (S)S * (x)S * (x)y =w Yu-Chen Kuo**Eliminating Ambiguity**stmt if expr then stmt | if expr then stmt else stmt | other Yu-Chen Kuo**Eliminating Ambiguity (cont.)**• Disambiguating rule: match each else with the closest previous unmatched then • The statement between a then and an else must be matched stmt matched_stmt | unmatched_stmt matched_stmt if expr then matched_stmt else matched_stmt | other unmatched_stmt if expr then matched_stmt else unmatched_stmt | if expr then stmt Yu-Chen Kuo**Eliminating Immediate Left Recursion**• A grammar is left recursive if it has a production A+A • Top-down parsing methods cannot handle left-recursion grammars because top-down parsing is corresponding to the leftmost derivation. Yu-Chen Kuo**Eliminating Immediate Left Recursion (cont.)**• Non-immediate left recursion S Aa | b A Ac | Sd | S Aa Sda Yu-Chen Kuo**Eliminating General Left Recursion**• Input Grammar G with no cycle (A+A) or -production Yu-Chen Kuo**Eliminating General Left Recursion (cont.)**• Non-immediate left recursion S Aa | b A Ac | Sd | A Ac | Aad | bd | • S Aa | b A bdA’ A’ cA’ | adA’ | Yu-Chen Kuo**Eliminating Left Factoring**• When it is not clear which of two alternative productions to use to expand a nonterminal A. We rewrite A-production to defer the decision until we have seen enough of the input. stmt if expr then stmt | if expr then stmt else stmt stmt if expr then stmt S’ S’ else stmt | • A 1 | 2 |…| n | A A’| A’ 1 | 2|…| n Yu-Chen Kuo**Non-Context-Free Language Constructs**• L1={wcw | w is in (a|b)*} is not context-free • L1’={wcwR | w is in (a|b)*} is context-free • S aSa | bSb | c • L2 = is not context-free • L2’ = is context-free • S aSd | aAd A bAc | bc • L2’’= is context-free Yu-Chen Kuo**Non-Context-Free Language Constructs**• L3 = is not context-free • L3’= is context-free • S aSb | ab • Context-free grammar can keep count of two items but not three. • Regular expression cannot keep count. Yu-Chen Kuo**Top-Down Parsing**• Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string. • It constructs a parser tree for the input string to root and creating the nodes of the parser tree in preorder. Yu-Chen Kuo**Recursive Descent Parsing**• A general top-down parsing that may involve backtracking • E.g., S cAd A ab| a , w=cad Yu-Chen Kuo**Predictive Parsers**• By carefully writing a grammar, eliminating left recursion, and left factoring, we obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking. (predictive parser) • Predictive Parser is implemented by recursive procedures Yu-Chen Kuo**Predictive Parsers (cont.)**type simple |id|array[simple]of type simple integer |char|num dotdot num Yu-Chen Kuo**Transition Diagrams for Predictive Parsers**• We can create a transition diagram for a predictive parsers • For each nonterminal A: • Create an initial and final state • For each production AX1X2…Xn, create a path from the initial to the final state, with edges labeled X1, X2, …, Xn • Based on transition diagram to match terminals again lookahead input symbols Yu-Chen Kuo**Transition Diagrams for Predictive Parsers (cont.)**Yu-Chen Kuo**Transition Diagrams for Predictive Parsers (cont.)**Yu-Chen Kuo**Transition Diagrams for Predictive Parsers (cont.)**Yu-Chen Kuo**Nonrecursive Predictive Parsing**• It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly, rather than via recursive calls. • The key problem during predictive parsing is that of determining the production to be applied for a nonterminal. The nonrecursive parser looks up the production to be applied in a parsing table. Yu-Chen Kuo**Nonrecursive Predictive Parsing(Cont.)**Yu-Chen Kuo**Nonrecursive Predictive Parsing(Cont.)**• The parser has an input buffer, a stack, a parsing table, and an output stream. • The input buffer contains the strings to be parsed followed by $, a symbol used to indicate the end of the input string. • The stack contains a sequence of grammar symbol with $ on the bottom, indicating the bottom of the stack, Initially, the stack contains the start symbol S of the grammar on top of $. Yu-Chen Kuo**Nonrecursive Predictive Parsing(Cont.)**• The output stream show the derivation steps for the grammar to produce the input string. • The parser table is a two-dimensional array M[A, a] to show the stack action for a nonterminal A in the top of stack to meet a terminal a or the symbol $. Yu-Chen Kuo**Predictive Parsing Algorithm**• Input. A string w and a parsing table M for G • Output. A leftmost derivation of w, if wL(G) • Method. • Put $S on stack where S is the start symbol of G • Put w$ in the input buffer • Execute the predictive parsing program (Fig. 4.14) Yu-Chen Kuo**Predictive Parsing Program**Yu-Chen Kuo**Example**• Consider non-left-recursive grammar for arithmetic expression E TE’ E’ + TE’ | T FT’ T’ * FT’ | F (E) | id Yu-Chen Kuo**Example (parsing table M)**Yu-Chen Kuo**Example (Stack Moves)**Yu-Chen Kuo**FIRST and FOLLOW**• The construction of a predictive parser is aided by FIRST and FOLLOW functions. • These functions help us to construction the predictive parser table. • FOLLOW function can also be used as synchronizing tokens during panic-mode error recovery. Yu-Chen Kuo