1 / 135

Chapter 4

Chapter 4 . Syntax Analysis. 4.1 The Role of The Parser. A parser obtains a string of tokens from the lexical analyzer and verifies the string can be generated by the grammar for the source language.

azura
Download Presentation

Chapter 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4 Syntax Analysis Yu-Chen Kuo

  2. 4.1 The Role of The Parser • A parser obtains a string of tokens from the lexical analyzer and verifies the string can be generated by the grammar for the source language. • We expect the parser to report any syntax errors in an intelligible fashion. It should also recover from commonly occurring errors so that it can continue processing the remainder of its input. Yu-Chen Kuo

  3. 4.1 The Role of The Parser Yu-Chen Kuo

  4. Three Types of Parsers • CYK algorithm and Early’s algorithm: inefficient to use in production compilers • Top-down method • Bottom-up method Yu-Chen Kuo

  5. Syntax Error Handling • Lexical error: • misspelling an identifier, keyword, or operator • Syntactic error: • an arithmetic expression with unbalanced parentheses • Semantic error: • an operator applied to an incompatible operand • Logical error: • an infinitely recursive call Yu-Chen Kuo

  6. Syntax Error Handling (Cont.) • The error handler in a parser has simple-to-state goals: • It should report the presence of errors clearly and accurately • It should recover from each error quickly enough to be able to detect subsequence errors • It should not significantly slow down the processing of correct programs Yu-Chen Kuo

  7. Error-Recovery Strategies • Panic mode • Discard the input symbol until one of a designated set of synchronizing tokens is found • synchronizing token: ;end • Guarantee not to go into an infinite loop • Phrase level • Parser may perform local correction • replace a prefix of the remaining input by some allowed string; • replace , by ; • delete an extraneous ; or insert missing ; • May lead to an infinite loop if we always insert something on the input ahead the current input symbol Yu-Chen Kuo

  8. Error-Recovery Strategies (cont.) • Error production • Grammars to produce errors • Global correction • Given an incorrect input string x and grammar G, find a parse tree for a related string y, such that the number of insertions, deletions, and changes of tokens required to transform x into y is as small as possible • Too costly Yu-Chen Kuo

  9. 4.2 Context-Free Grammars • stmtifexprthenstmtelsestmt • Terminals: tokens • if, then, else • Noterminals: set of strings • expr, stmt • Start symbol • stmt • Productions Yu-Chen Kuo

  10. Example 4.2 expr  expr op expr expr (expr) expr  - expr expr id op  + op  - op  * op  / op  • Terminals: id, +, -, *, / .  • Noterminals: expr, op • Start symbol: expr Yu-Chen Kuo

  11. Notational Conventions • There symbols are terminals: • Lower-case letters: a, b, c • Operator symbols: +, - • Punctuation symbols: parentheses, comma • Digits: 0, 1, …,9 • Boldface strings: id, if • There symbols are nonterminal: • Upper-case letters: A, B, C • The letter S: start symbol • Lower-case italic names: expr, stmt Yu-Chen Kuo

  12. Notational Conventions (cont.) • Upper-case letters late in alphabet: X, Y, Z, represent grammar symbols (terminals or nonterminal) • Lower-case letters late in alphabet: u, v,…z, represent string of terminals • Lower-case Greek letters : , , , represent string of grammar symbols • A-productions (all productions): A 1| 2|…| k • Start symbol: the left side of the first production Yu-Chen Kuo

  13. Example 4.3 E  EAE | (E) | - E | id A  + | - | * | / |  By notational conventions − Nonterminals: E, A Terminals: remaining symbols Yu-Chen Kuo

  14. Derivations E  E+E | E*E| (E) | - E | id • Ederives -E • E - E • The derivation of -(id) from E • E - E  - (E) -(id) • A    , if A   •  : one step derivation • : zero or more steps derivations • : one or more steps derivations Yu-Chen Kuo

  15. Derivations (cont.) • , for any string  • If  ,    , then   • L(G) denotes the language generated by G. L(G) contains all terminal symbols. w L(G), ifSw. String w is call a sentence of G. • S,  may contain nonterminals. We call  is a sentential form of G. • E.g., -(id + id) is a sentence of the grammar, because E -(id + id) Yu-Chen Kuo

  16. Leftmost & Rightmost Derivations • Leftmost derivation ( ) • -(E+E) -(id+E)  -(id+id) • Rightmost derivation ( ) • -(E+E) -(E+id)  -(id+id) • S. We call  is a left-sentential form of G. • S. We call  is a cannonical-sentential form of G. Yu-Chen Kuo

  17. Parse Tree and Derivations Yu-Chen Kuo

  18. Parse Tree and Derivations (cont.) Yu-Chen Kuo

  19. Ambiguity • More than one parse tree for some sentences • More than leftmost derivation for some sentences • More than rightmost derivation for some sentences Yu-Chen Kuo

  20. 4.3 Regular Expression vs. Context-free grammar • Every language that can be described by a regular expression can also be described by a context-free grammar • (a|b)*abb • A0  aA0 | bA0 | aA1 A1 bA2 A2 bA3 A3  • Every regular set is a context-free language Yu-Chen Kuo

  21. Why use regular expression to define the lexical syntax of a language ? • Why not use CFG for the lexical syntax • Lexical rules of a language are frequently quite simple. We do not need a powerful grammar. • Regular expression provide a more concise and easier to understand notation for tokens • An efficient lexical analysis can be constructed automatically from regular expressions • Separating the syntactic structure of a language into lexical and nonlexical parts Yu-Chen Kuo

  22. Why use regular expression to define the lexical syntax of a language ? • Regular expressions are most useful for describing structure of lexical constructs such as identifies, constants, keywords • Grammars are most useful for describing nested structure of lexical constructs such balanced parentheses, matching begin-end’s, corresponding if-then-else’s. • Nested structures can not be described by regular expressions. Yu-Chen Kuo

  23. Verifying the Language Generated by a Grammar • Proof that L(G) = L • Every string generated by G is in L • Every string in L can be generated by G • S  (S)S |  , generates all string of balanced ( ) • Every sentence derived from S is balanced by induction • S  (S)S * (x)S * (x)y (n steps) • S * x (less than n setps and must be balanced) • S * y (less than n setps and must be balanced) • Every balanced string length 2n is derivable from S • w = (x)y of length 2n • x and y are length of less than 2n. They are both balanced and derivable from S • S  (S)S * (x)S * (x)y =w Yu-Chen Kuo

  24. Eliminating Ambiguity stmt if expr then stmt | if expr then stmt else stmt | other Yu-Chen Kuo

  25. Eliminating Ambiguity (cont.) • Disambiguating rule: match each else with the closest previous unmatched then • The statement between a then and an else must be matched stmt  matched_stmt | unmatched_stmt matched_stmt if expr then matched_stmt else matched_stmt | other unmatched_stmt if expr then matched_stmt else unmatched_stmt | if expr then stmt Yu-Chen Kuo

  26. Eliminating Immediate Left Recursion • A grammar is left recursive if it has a production A+A • Top-down parsing methods cannot handle left-recursion grammars because top-down parsing is corresponding to the leftmost derivation. Yu-Chen Kuo

  27. Eliminating Immediate Left Recursion (cont.) • Non-immediate left recursion S  Aa | b A  Ac | Sd |  S  Aa  Sda Yu-Chen Kuo

  28. Eliminating General Left Recursion • Input Grammar G with no cycle (A+A) or -production Yu-Chen Kuo

  29. Eliminating General Left Recursion (cont.) • Non-immediate left recursion S  Aa | b A  Ac | Sd |   A  Ac | Aad | bd |  • S  Aa | b A  bdA’ A’  cA’ | adA’ |  Yu-Chen Kuo

  30. Eliminating Left Factoring • When it is not clear which of two alternative productions to use to expand a nonterminal A. We rewrite A-production to defer the decision until we have seen enough of the input. stmt if expr then stmt | if expr then stmt else stmt  stmt if expr then stmt S’ S’ else stmt |  • A  1 | 2 |…| n |   A  A’|  A’  1 | 2|…| n Yu-Chen Kuo

  31. Non-Context-Free Language Constructs • L1={wcw | w is in (a|b)*} is not context-free • L1’={wcwR | w is in (a|b)*} is context-free • S  aSa | bSb | c • L2 = is not context-free • L2’ = is context-free • S  aSd | aAd A  bAc | bc • L2’’= is context-free Yu-Chen Kuo

  32. Non-Context-Free Language Constructs • L3 = is not context-free • L3’= is context-free • S  aSb | ab • Context-free grammar can keep count of two items but not three. • Regular expression cannot keep count. Yu-Chen Kuo

  33. Top-Down Parsing • Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string. • It constructs a parser tree for the input string to root and creating the nodes of the parser tree in preorder. Yu-Chen Kuo

  34. Recursive Descent Parsing • A general top-down parsing that may involve backtracking • E.g., S  cAd A  ab| a , w=cad Yu-Chen Kuo

  35. Predictive Parsers • By carefully writing a grammar, eliminating left recursion, and left factoring, we obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking. (predictive parser) • Predictive Parser is implemented by recursive procedures Yu-Chen Kuo

  36. Predictive Parsers (cont.) type  simple |id|array[simple]of type simple  integer |char|num dotdot num Yu-Chen Kuo

  37. Transition Diagrams for Predictive Parsers • We can create a transition diagram for a predictive parsers • For each nonterminal A: • Create an initial and final state • For each production AX1X2…Xn, create a path from the initial to the final state, with edges labeled X1, X2, …, Xn • Based on transition diagram to match terminals again lookahead input symbols Yu-Chen Kuo

  38. Transition Diagrams for Predictive Parsers (cont.) Yu-Chen Kuo

  39. Transition Diagrams for Predictive Parsers (cont.) Yu-Chen Kuo

  40. Transition Diagrams for Predictive Parsers (cont.) Yu-Chen Kuo

  41. Nonrecursive Predictive Parsing • It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly, rather than via recursive calls. • The key problem during predictive parsing is that of determining the production to be applied for a nonterminal. The nonrecursive parser looks up the production to be applied in a parsing table. Yu-Chen Kuo

  42. Nonrecursive Predictive Parsing(Cont.) Yu-Chen Kuo

  43. Nonrecursive Predictive Parsing(Cont.) • The parser has an input buffer, a stack, a parsing table, and an output stream. • The input buffer contains the strings to be parsed followed by $, a symbol used to indicate the end of the input string. • The stack contains a sequence of grammar symbol with $ on the bottom, indicating the bottom of the stack, Initially, the stack contains the start symbol S of the grammar on top of $. Yu-Chen Kuo

  44. Nonrecursive Predictive Parsing(Cont.) • The output stream show the derivation steps for the grammar to produce the input string. • The parser table is a two-dimensional array M[A, a] to show the stack action for a nonterminal A in the top of stack to meet a terminal a or the symbol $. Yu-Chen Kuo

  45. Predictive Parsing Algorithm • Input. A string w and a parsing table M for G • Output. A leftmost derivation of w, if wL(G) • Method. • Put $S on stack where S is the start symbol of G • Put w$ in the input buffer • Execute the predictive parsing program (Fig. 4.14) Yu-Chen Kuo

  46. Predictive Parsing Program Yu-Chen Kuo

  47. Example • Consider non-left-recursive grammar for arithmetic expression E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F (E) | id Yu-Chen Kuo

  48. Example (parsing table M) Yu-Chen Kuo

  49. Example (Stack Moves) Yu-Chen Kuo

  50. FIRST and FOLLOW • The construction of a predictive parser is aided by FIRST and FOLLOW functions. • These functions help us to construction the predictive parser table. • FOLLOW function can also be used as synchronizing tokens during panic-mode error recovery. Yu-Chen Kuo

More Related