Chapter 4

Chapter 4 Syntax Analysis Yu-Chen Kuo

4.1 The Role of The Parser • A parser obtains a string of tokens from the lexical analyzer and verifies the string can be generated by the grammar for the source language. • We expect the parser to report any syntax errors in an intelligible fashion. It should also recover from commonly occurring errors so that it can continue processing the remainder of its input. Yu-Chen Kuo

4.1 The Role of The Parser Yu-Chen Kuo

Three Types of Parsers • CYK algorithm and Early’s algorithm: inefficient to use in production compilers • Top-down method • Bottom-up method Yu-Chen Kuo

Syntax Error Handling • Lexical error: • misspelling an identifier, keyword, or operator • Syntactic error: • an arithmetic expression with unbalanced parentheses • Semantic error: • an operator applied to an incompatible operand • Logical error: • an infinitely recursive call Yu-Chen Kuo

Syntax Error Handling (Cont.) • The error handler in a parser has simple-to-state goals: • It should report the presence of errors clearly and accurately • It should recover from each error quickly enough to be able to detect subsequence errors • It should not significantly slow down the processing of correct programs Yu-Chen Kuo

Error-Recovery Strategies • Panic mode • Discard the input symbol until one of a designated set of synchronizing tokens is found • synchronizing token: ;end • Guarantee not to go into an infinite loop • Phrase level • Parser may perform local correction • replace a prefix of the remaining input by some allowed string; • replace , by ; • delete an extraneous ; or insert missing ; • May lead to an infinite loop if we always insert something on the input ahead the current input symbol Yu-Chen Kuo

Error-Recovery Strategies (cont.) • Error production • Grammars to produce errors • Global correction • Given an incorrect input string x and grammar G, find a parse tree for a related string y, such that the number of insertions, deletions, and changes of tokens required to transform x into y is as small as possible • Too costly Yu-Chen Kuo

4.2 Context-Free Grammars • stmtifexprthenstmtelsestmt • Terminals: tokens • if, then, else • Noterminals: set of strings • expr, stmt • Start symbol • stmt • Productions Yu-Chen Kuo

Example 4.2 expr  expr op expr expr (expr) expr  - expr expr id op  + op  - op  * op  / op  • Terminals: id, +, -, *, / .  • Noterminals: expr, op • Start symbol: expr Yu-Chen Kuo

Notational Conventions • There symbols are terminals: • Lower-case letters: a, b, c • Operator symbols: +, - • Punctuation symbols: parentheses, comma • Digits: 0, 1, …,9 • Boldface strings: id, if • There symbols are nonterminal: • Upper-case letters: A, B, C • The letter S: start symbol • Lower-case italic names: expr, stmt Yu-Chen Kuo

Notational Conventions (cont.) • Upper-case letters late in alphabet: X, Y, Z, represent grammar symbols (terminals or nonterminal) • Lower-case letters late in alphabet: u, v,…z, represent string of terminals • Lower-case Greek letters : , , , represent string of grammar symbols • A-productions (all productions): A 1| 2|…| k • Start symbol: the left side of the first production Yu-Chen Kuo

Example 4.3 E  EAE | (E) | - E | id A  + | - | * | / |  By notational conventions − Nonterminals: E, A Terminals: remaining symbols Yu-Chen Kuo

Derivations E  E+E | E*E| (E) | - E | id • Ederives -E • E - E • The derivation of -(id) from E • E - E  - (E) -(id) • A    , if A   •  : one step derivation • : zero or more steps derivations • : one or more steps derivations Yu-Chen Kuo

Derivations (cont.) • , for any string  • If  ,    , then   • L(G) denotes the language generated by G. L(G) contains all terminal symbols. w L(G), ifSw. String w is call a sentence of G. • S,  may contain nonterminals. We call  is a sentential form of G. • E.g., -(id + id) is a sentence of the grammar, because E -(id + id) Yu-Chen Kuo

Leftmost & Rightmost Derivations • Leftmost derivation ( ) • -(E+E) -(id+E)  -(id+id) • Rightmost derivation ( ) • -(E+E) -(E+id)  -(id+id) • S. We call  is a left-sentential form of G. • S. We call  is a cannonical-sentential form of G. Yu-Chen Kuo

Parse Tree and Derivations Yu-Chen Kuo

Parse Tree and Derivations (cont.) Yu-Chen Kuo

Ambiguity • More than one parse tree for some sentences • More than leftmost derivation for some sentences • More than rightmost derivation for some sentences Yu-Chen Kuo

4.3 Regular Expression vs. Context-free grammar • Every language that can be described by a regular expression can also be described by a context-free grammar • (a|b)*abb • A0  aA0 | bA0 | aA1 A1 bA2 A2 bA3 A3  • Every regular set is a context-free language Yu-Chen Kuo

Why use regular expression to define the lexical syntax of a language ? • Why not use CFG for the lexical syntax • Lexical rules of a language are frequently quite simple. We do not need a powerful grammar. • Regular expression provide a more concise and easier to understand notation for tokens • An efficient lexical analysis can be constructed automatically from regular expressions • Separating the syntactic structure of a language into lexical and nonlexical parts Yu-Chen Kuo

Why use regular expression to define the lexical syntax of a language ? • Regular expressions are most useful for describing structure of lexical constructs such as identifies, constants, keywords • Grammars are most useful for describing nested structure of lexical constructs such balanced parentheses, matching begin-end’s, corresponding if-then-else’s. • Nested structures can not be described by regular expressions. Yu-Chen Kuo

Verifying the Language Generated by a Grammar • Proof that L(G) = L • Every string generated by G is in L • Every string in L can be generated by G • S  (S)S |  , generates all string of balanced ( ) • Every sentence derived from S is balanced by induction • S  (S)S * (x)S * (x)y (n steps) • S * x (less than n setps and must be balanced) • S * y (less than n setps and must be balanced) • Every balanced string length 2n is derivable from S • w = (x)y of length 2n • x and y are length of less than 2n. They are both balanced and derivable from S • S  (S)S * (x)S * (x)y =w Yu-Chen Kuo

Eliminating Ambiguity stmt if expr then stmt | if expr then stmt else stmt | other Yu-Chen Kuo

Eliminating Ambiguity (cont.) • Disambiguating rule: match each else with the closest previous unmatched then • The statement between a then and an else must be matched stmt  matched_stmt | unmatched_stmt matched_stmt if expr then matched_stmt else matched_stmt | other unmatched_stmt if expr then matched_stmt else unmatched_stmt | if expr then stmt Yu-Chen Kuo

Eliminating Immediate Left Recursion • A grammar is left recursive if it has a production A+A • Top-down parsing methods cannot handle left-recursion grammars because top-down parsing is corresponding to the leftmost derivation. Yu-Chen Kuo

Eliminating Immediate Left Recursion (cont.) • Non-immediate left recursion S  Aa | b A  Ac | Sd |  S  Aa  Sda Yu-Chen Kuo

Eliminating General Left Recursion • Input Grammar G with no cycle (A+A) or -production Yu-Chen Kuo

Eliminating General Left Recursion (cont.) • Non-immediate left recursion S  Aa | b A  Ac | Sd |   A  Ac | Aad | bd |  • S  Aa | b A  bdA’ A’  cA’ | adA’ |  Yu-Chen Kuo

Eliminating Left Factoring • When it is not clear which of two alternative productions to use to expand a nonterminal A. We rewrite A-production to defer the decision until we have seen enough of the input. stmt if expr then stmt | if expr then stmt else stmt  stmt if expr then stmt S’ S’ else stmt |  • A  1 | 2 |…| n |   A  A’|  A’  1 | 2|…| n Yu-Chen Kuo

Non-Context-Free Language Constructs • L1={wcw | w is in (a|b)*} is not context-free • L1’={wcwR | w is in (a|b)*} is context-free • S  aSa | bSb | c • L2 = is not context-free • L2’ = is context-free • S  aSd | aAd A  bAc | bc • L2’’= is context-free Yu-Chen Kuo

Non-Context-Free Language Constructs • L3 = is not context-free • L3’= is context-free • S  aSb | ab • Context-free grammar can keep count of two items but not three. • Regular expression cannot keep count. Yu-Chen Kuo

Top-Down Parsing • Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string. • It constructs a parser tree for the input string to root and creating the nodes of the parser tree in preorder. Yu-Chen Kuo

Recursive Descent Parsing • A general top-down parsing that may involve backtracking • E.g., S  cAd A  ab| a , w=cad Yu-Chen Kuo

Predictive Parsers • By carefully writing a grammar, eliminating left recursion, and left factoring, we obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking. (predictive parser) • Predictive Parser is implemented by recursive procedures Yu-Chen Kuo

Predictive Parsers (cont.) type  simple |id|array[simple]of type simple  integer |char|num dotdot num Yu-Chen Kuo

Transition Diagrams for Predictive Parsers • We can create a transition diagram for a predictive parsers • For each nonterminal A: • Create an initial and final state • For each production AX1X2…Xn, create a path from the initial to the final state, with edges labeled X1, X2, …, Xn • Based on transition diagram to match terminals again lookahead input symbols Yu-Chen Kuo

Transition Diagrams for Predictive Parsers (cont.) Yu-Chen Kuo

Nonrecursive Predictive Parsing • It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly, rather than via recursive calls. • The key problem during predictive parsing is that of determining the production to be applied for a nonterminal. The nonrecursive parser looks up the production to be applied in a parsing table. Yu-Chen Kuo

Nonrecursive Predictive Parsing(Cont.) Yu-Chen Kuo

Nonrecursive Predictive Parsing(Cont.) • The parser has an input buffer, a stack, a parsing table, and an output stream. • The input buffer contains the strings to be parsed followed by $, a symbol used to indicate the end of the input string. • The stack contains a sequence of grammar symbol with $ on the bottom, indicating the bottom of the stack, Initially, the stack contains the start symbol S of the grammar on top of $. Yu-Chen Kuo

Nonrecursive Predictive Parsing(Cont.) • The output stream show the derivation steps for the grammar to produce the input string. • The parser table is a two-dimensional array M[A, a] to show the stack action for a nonterminal A in the top of stack to meet a terminal a or the symbol $. Yu-Chen Kuo

Predictive Parsing Algorithm • Input. A string w and a parsing table M for G • Output. A leftmost derivation of w, if wL(G) • Method. • Put $S on stack where S is the start symbol of G • Put w$ in the input buffer • Execute the predictive parsing program (Fig. 4.14) Yu-Chen Kuo

Predictive Parsing Program Yu-Chen Kuo

Example • Consider non-left-recursive grammar for arithmetic expression E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F (E) | id Yu-Chen Kuo

Example (parsing table M) Yu-Chen Kuo

Example (Stack Moves) Yu-Chen Kuo

FIRST and FOLLOW • The construction of a predictive parser is aided by FIRST and FOLLOW functions. • These functions help us to construction the predictive parser table. • FOLLOW function can also be used as synchronizing tokens during panic-mode error recovery. Yu-Chen Kuo

Chapter 4

Chapter 4

Presentation Transcript

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4-4

Chapter 4

Chapter 4

Chapter 4 - 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

Chapter 4