1 / 28

Chapter 4 Syntax Analysis

Chapter 4 Syntax Analysis. Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax Error Handling and Recovery. Introduction. Why CFG CFG gives a precise syntactic specification of a programming language.

andreas
Download Presentation

Chapter 4 Syntax Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax Error Handling and Recovery compiler Constreuction

  2. Introduction • Why CFG • CFG gives a precise syntactic specification of a programming language. • Automatic efficient parser generator • Enabling automatic translator generator • Language extension becomes easier • The role of the parser • Taking tokens from scanner, parsing, reporting syntax errors • Not just parsing, in a syntax-directed translator, the parser also conducts type checking, semantic analysis and IR generation. compiler Constreuction

  3. Example of CFG • A C– program is made out of functions, a function out of declarations and blocks, a block out of statements, a statement out of expressions, … etc <program>  <global_decl_list> <global_decl_list>  <global_decl_list><global_decl> | e <global_decl>  <decl_list> <function_decl> <function_decl>  <type> id ( <param_list> ) { <block> } <block>  <decl_list> <statement_list> | e <decl_list>  <decl_list> <decl> | <decl> | e <decl>  <type_decl> | <var_decl> <type>  void | int | float <statement_list>  …. <statement>  { <block> } compiler Constreuction

  4. Notational Conventions • Following symbols are terminals • Lower case letters such as a,b,c. • Operators (+,-, etc) and punctuation symbols (parentheses, commas, etc) • Digits such as 0,1,2,etc • Boldface strings such as id or if compiler Constreuction

  5. Notational Conventions • Nonterminals • Upper case letters such as A,B,C • The letter S – the start symbol • Lower case italic names such as expr or stmt • Grammar symbols • upper case, late in the alphabet, such as X,Y,Z,. • Strings of terminals • lower case letters late in the alphabet, such as u,v,.. z • Strings of grammar symbols • Lower-case Greek letters, such as a,b,g compiler Constreuction

  6. Example expr expr op expr expr  (expr) expr - expr expr  id op  + op  - op  * op  / op  h Using the notational shorthand E  E A E | (E) | -E | id A  + | - | * | / | h Non-terminals: E and A Start symbol: E compiler Constreuction

  7. Derivation • Given a string aAb If A  g is a production, then we can replace aAb by agb, written as aAb  agb • means derives in one-step • + means derive in one or more steps • * means drive in zero or more steps The language L(G) generated by G is the set of terminal strings w such that S + w. The string w is called a sentence of G. If S * a where a may contain nonterminals, we say a is a sentential form of G compiler Constreuction

  8. Exercise • What is a sentence of language L defined by the C++ grammar G? • Is the following string a sentence or a sentential form? int parse(<parameter_list>) {} a C++ program A sentential form compiler Constreuction

  9. Derivation (cont.) Consider the following grammar G0 E  E + E | E * E | (E) | -E | id The string -(id + id) is a sentence of G0 because there is a derivation E  - E  - (E)  - (E+E)  - (id +E)  -(id + id) Leftmost derivation: only the leftmost nonterminal is replaced Rightmost derivation: only the rightmost nonterminal is replaced Exercise: is id-id a sentence of G0? Is –id+id a sentence? Yes No compiler Constreuction

  10. Parse Tree and Derivation A Parse tree can be viewed as a graphical representation for a derivation that ignore replacement order. E  - E  - (E)  - (E+E)  - (id +E)  -(id + id) E - E ( E ) Interior node: non-terminal Leaves: terminal Children: right-hand side E + E id id compiler Constreuction

  11. CFG is more powerful than RE • Every RE can be described by a CFG • Example (a|b)*abb A  aA | bA | abb • Converting a NFA into a CFG • For each state I of the NFA, create a nonterminal symbol Ai • If state i goes to stat j on input a, add production Ai  aAj • Ai Aj if state i goes to j on e • Ai  e if state i is an accepting state compiler Constreuction

  12. Why do we need RE? • RE is sufficiently powerful for lexical rules • RE is more concise and easier to understand • More efficient lexical analyzer can be constructed from RE than from CFG • Separating lexical from nonlexical part has a few advantages such as modularization, easier to port, etc. • Exercise:what if we don’t have token definition? compiler Constreuction

  13. Defects in CFG • Useless nonterminals • S  A | B A  a B  Bb C  c • Ambiguity • Top-Down parsing issues • Left recursion • Left factoring <derives no terminal string> <unreachable> compiler Constreuction

  14. Ambiguity • A grammar is ambiguous if it produces more than one parse tree for some sentences • example 1: A+B+C ( is it (A+B)+C or A+(B+C) ) • Improper production: expr  expr + expr | id • example 2: A+B*C ( is it (A+B)*C or A+(B*C) ) • Improper production: expr  expr + expr | expr * expr • example 3: if E1 then if E2 then S1 else S2 (which then does the else match with) • Improper production: • stmt  if expr then stmt | if expr then stmt else stmt compiler Constreuction

  15. Two parse trees of example 3 stmt stmt if E1 then stmt if E1 then stmt else S2 if E2 then S1 else S2 if E2 then S1 compiler Constreuction

  16. Eliminating Ambiguity • Operator Associativity • expr  expr + term | term • Operator Precedence • expr  expr + term | term term  term * factor | factor • Dangling Else • stmt  matched | unmatched matched if expr then matched else matched unmatched  if expr then stmt | if expr then matched else unmatched compiler Constreuction

  17. Eliminating Left Recursion • Immediate left recursion • Example: A  Aa | b • Transformation A  Aa1 | Aa2 | … | b1 | b2 | … Where no b begins with A, we replace A productions by A  b1A’ | b2A’ | …. A’  a1A’ | a2A’ | … | e compiler Constreuction

  18. Indirect Left Recursion • Example: S  Aa | b A  Ac | Sd | e • Transformation (assuming no cycles A+ A) • Arrange nonterminals in order A1, A2, … An • for i := 1 to n do for j := 1 to i-1 do begin Replace Ai  Ajg by Ai d1g | d2g .. where Aj d1 | d2 | … are current Aj prod end Eliminate the immediate left recursion among Ai end compiler Constreuction

  19. In the above example, S  Aa | b A  Ac | Sd | e A  Sd will be replaced by A  Ac | Aad | bd | e , then eliminates immediate recursion among A productions and yields the following S  Aa | b A  bdA’ | A’ A’  cA’ | adA’ | e compiler Constreuction

  20. Algorithm 4.1 Eliminating Left Recursion • This algorithm will systematically eliminate left recursions from a grammar. • This is about how to remove indirect left recursions. • Precondition: the grammar has no cycles or e-productions. A cycle means: A + A To avoid getting A  A type of productions during nonterminal replacement. For example, A BA, B  Ab | e when ABA is derived to AeA, a cycle shows up. e-production also makes the algorithm more complex because ABCD may be derived to ACD so handling the leftmost non-terminal only is not sufficient compiler Constreuction

  21. Indirect Left Recursion A  Bb | a B  Cc | b C  Dd | c D  Aa | d A  Bb  Ccb  Ddcb  Aadcb C  Dd  Aad  Bbad  Ccbad Need to expose immediate left recursions and then eliminate them. Some ordering is needed. Suppose we replace ABb by A Ccb and then start with B  Cc  Ddc  Aadc Ccbabc, this would never expose the immediate left recursion in this example. compiler Constreuction

  22. Algorithm 4.1 For i:= 1 to n do begin For j:= 1 to i-1 do begin replace each production of the form Ai  Ajg by the productions Aid1g | d2g .. where Ajd1 | d2 | … are current Aj production End eliminate the immediate left recursion among Ai-productions End Key idea: For each non-terminal Ai, all references to lower numbered non-terminal Aj, (where j < i) will be replaced by higher numbered non-terminals. compiler Constreuction

  23. . A1  … A2  Ai-1 g | Ai+k h | … … Ai  Ai-1a | A2 b | … … An After replacement, there will be no backward references compiler Constreuction

  24. Left Factoring Consider the following grammar A  ab1 | ab2 It is not easy to determine whether to expand A to ab1 or ab2 A transformation called left factoring can be applied. It becomes: A  aA’ A’  b1 | b2 compiler Constreuction

  25. Exercise stmt  if expr then stmt | if expr then stmt else stmt For the following grammar form: A  ab1 | ab2 What is a? b1? b2? • : if expr then stmt b1: e b2: else stmt compiler Constreuction

  26. Syntax Error Handling • Different type of errors • Lexical • Syntactic • Semantic • Logical • Error handling goals • Report errors clearly and accurately • Recover quickly • Fast compiler Constreuction

  27. Error Handling Strategies • Don’t quit after detecting the 1st error. • Avoid introducing “spurious” errors • Inhibit error messages that stem from errors uncovered too close together • Simple error repair will be sufficient due to the increasing emphasis on interactive computing and good programming environment. compiler Constreuction

  28. Error Recovery Strategies • Panic mode • Deleting input tokens until one of a designated set of synchronizing tokens is found. • Phrase level • Local correction to repair punctuation errors • Error productions • Augment the grammar with error productions • Global correction • Globally least-cost correction to a string, costly to implement. compiler Constreuction

More Related