1 / 38

Lecture 6: Grammar Modifications for Compiler Construction

This lecture covers topics such as grammars for expressions and if-then-else, formal proofs of L(G), top-down parsing, left factoring, and removing left recursion.

lyndas
Download Presentation

Lecture 6: Grammar Modifications for Compiler Construction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 6 Grammar Modifications CSCE 531 Compiler Construction • Topics • Grammars for expressions and if-then-else • Formal proofs of L(G) • Top-down parsing • Left factoring • Removing left recursion • Readings: 4.3-4.4 • Homework: 4.1, 4.2a, 4.6a, 4.11a January 30, 2006

  2. Overview • Last Time • Should have mentioned DFA minimization • Grammars, Derivations, Ambiguity • Lec05-Grammars: Slides 1-27 • Today’s Lecture • Ambiguity in classic programming language grammars • Expressions • If-Then-Else • Top-Down parsing • References • Sections 4.3-4.4 • Parse demos • http://ag-kastens.uni-paderborn.de/lehre/material/compiler/parsdemo/ • Chomsky Hierarchy – types of grammars and recognizers • http://en.wikipedia.org/wiki/Chomsky_hierarchy • Homework: 4.1, 4.2a, 4.6a, 4.11a

  3. DFA Minimization • Algorithm 3.6 in text • We will not cover this algorithm other than this slide. • Partition states into F and Q-F (final and non final states) • Refine the partitioning as much as possible. • Refinement – a string x=x1x2…xtdistinguishes between two states Si and Sk if starting in each and following the path determined by x one ends in an accepting state and the other ends in a non-accepting state x Si Sa Accepting x Sna Sk Non-accepting

  4. LM Derivation of 5 * X + 3 * Y +17 • Parse tree • E • E  E + T | E – T | T • T  T * F | T / F | F • F  id | num | ( E ) • E  E+T  E+E+T  T+E+T • T*F+E+T T*F+E+T • F*F+E+T num*F+E+T  • num*id+E+T  • num*id+T+T  • num*id+T*F+T  • num*id+T*F+T  • num*id+F*F+T  • num*id+num*F+T  …

  5. Notes on rewritten grammar • It is more complex; more nonterminals, more productions. • It requires more steps in the derivation • But it does eliminate the ambiguity, so we make the right choices in derivations.

  6. Ambiguous Grammar 2 If-else Another classic ambiguity problem in programming languages is the IF-ELSE Stmt  if Expr then Stmt | if Expr then Stmt else Stmt | other stmts S  if E then S | if E then S else S | OS

  7. Ambiguity This sentential form has two derivations if Expr1 then if Expr2 then Stmt1 else Stmt2

  8. Removing the ambiguity • To eliminate the ambiguity • We must rewrite the grammar to avoid generating the problem • We must associate each else with the innermost unmatched if • S  withElse

  9. Removing the IF-ELSE Ambiguity Stmt  if Expr then Stmt | if Expr then Stmt else Stmt | other stmts Stmt  MatchedStmt | UnmatchedStmt MatchedStmt  if Expr then MatchedStmt else MatchedStmt | OthersStatements UnmatchedStmt  if Expr then MatchedStmt else | if Expr then MatchedStmt else UmatchedStmt

  10. Ambiguity if Expr1then if Expr2 then Stmt1 else Stmt2

  11. Ambiguity that is more than Grammar The examples of Ambiguity that we have looked at are solved by tweaking the CFG Overloading can create deeper ambiguity, a = f(17) In some languages, f could be either a function or a subscripted variable Disambiguating this requires semantics not just syntax Declarations, type information to say what “f” is. Requires an extra-grammatical solution Must handle these with a different mechanism Step outside grammar rather than use a more complex grammar

  12. Regular versus Context free Languages • A regular language is a set of strings that can be: • Recoginzed by a DFA, • Recognized by an NFA, or (/and) • Denoted by regular expressions. • Example of non-regular languages? • A context free language is one that is generated by a context free grammar. • S 0S1 | ε

  13. Formal verification of L(G) • Example 4.7: • Induction on length of derivation of a sentential forms • Formulate inductive hypothesis in terms of sentential forms • Basis step n=1 • Assume derivations of length n satisfy the Inductive Hypothesis. • Show that derivations of length n+1 also satisfy

  14. Regular Grammars (Linear Grammars) • A right-linear grammar is a restricted form of context free grammar in which the productions have a special form: • N  T* N2 • N  T* • Where N and N2 (possibly the same) are non-terminals and T* is a string of tokens • In these productions if there is a non-terminal on the right hand side then it is the last symbol • Linear grammars (right and left linear) are also called regular grammars. Why?

  15. DFA  Right-linear Grammar • Consider DFA M = (Q, Σ, δ, q0, F) • (notice re-ordering! and Q!) • Construct a grammar G = (N,T,P,S) where • N = Q i.e. each state corresponds to a non-terminal • T = Σ • For each transition δ(si, a) = sj, we have a production • Si  a Sj • And for each state S in F we add a production • S  ε • Then L(M) = L(G) How would we formally prove this? • Thus regular languages are a subset of the Context free languages

  16. Fig 3.23 p 117 N0 a N1 | b N0 N1 a N1 | b N2 N2 … N3 … Example DFA  Regular Grammar

  17. Chomsky Hierarchy • Noam Chomsky linguist: Formal levels of grammars • Regular grammars, N  T* N • Context-free grammars, N  (N U T)* • Context sensitive grammars, αNω  αβω • We can rewrite αNω  β, but only in the “context” αNω • Unrestricted grammars, α  β with α and β in (N U T)* • Recognizers: • DFA (regular) • Pushdown automata, DFA augmented with stack • Linear bounded Turing machine • Turing machine http://en.wikipedia.org/wiki/Chomsky_hierarchy

  18. Non-Context Free Languages • Certain languages cannot have a context free grammar that generates them, they are not context free languages • Examples • Σ = { a, b, c}, L = {wcw | w is in Σ*} • {anbncn| n > 0} • However they are context sensitive, or are they? • Well, not relevant for this course. • We would eliminate any non-context-free construct from a programming language! (at least for parsing) • S  abc | aSBc • cB  Bc • bB  bb • Alternative form of Cont. Sensitive • productions αβ • satisfy |α| <= |β|

  19. Parsing Techniques Top-down parsers Start at the root and try to generate the parse tree Pick a production and try to match the input If we make a bad choice then backtrack and try another choice Grammars that allow backtrack-free parsing sometimes will exist and are Bottom-up parsers Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal first tokens Bottom-up parsers handle a large class of grammars

  20. Top-down Parsing Algorithm Add the start symbol as the root of the parse tree While the frontier of the parse tree != input { Pick the “leftmost” non-terminal in the frontier, A Choose an A-production, A  β1,β2,…βk, and expand the tree (other choices saved on stack) If a token is added to the frontier that does not match the input backtrack and choose another production (if we run out of choices the parse fails.) } We now will look at modifications to grammars to facilitate top-down parsing.

  21. Reconsider Our Expression Grammar • First we number the productions for documentation • E  E + T • E  E – T • E  T • T  T * F • T  T / F • T  F • F  id • F  num • F  ( E ) • Example: 5 * X + 3 * Y +17 • Token seq.: num * id + num * id + num

  22. How do we choose which production? It should be guided by trying to match the input E.g., if the next input symbol is the token “if” and we are choosing between between S  if Expr then S else S S  while Expr do S What choice is best? Well the choice is obvious! But if the next input symbol is the token “if” and we are choosing between between S  if Expr then S else S S  while Expr do S What choice is best? Well the choice is obvious!

  23. How do we choose which production? (continued) But if the next input symbol is the token “if” and we are choosing between S  if Expr then S else S S  if Expr then S What choice is best? Well now the choice is not obvious!

  24. Other Grammar Modifications to Guide Parser • Left Factoring • Stmt  if Expr then Stmt else Stmt | if Expr then Stmt • If the next tokens are “if” and “id” then we have no basis to choose, in fact we have to look ahead to see the “else” • Stmt  if Expr then Stmt Rest • Rest  else Stmt | ε • Left Recursion • A  Aα | β • Why recursive? • AAαAαα Aααα … Aαn  βαn • What do we do? • A  βA’ and A’  αA’ | ε • A βA’  βαA’ βααA’… βαnA’ βαn

  25. General Left Factoring Algorithm • Algorithm 4.2 • Input: a grammar G • Output: an equivalent left-factored grammar. • Method: • For each nonterminal A • find the longest prefix α common to two or more A-productions • A  αβ1 | αβ2 | … | αβm | ξ , where ξ represents the A-productions that don’t start with the prefix α • Replace with • A  αA’ | ξ • A’  β1 | β2 | … | βm

  26. A graphical explanation for the same idea becomes … Left Factoring 1 A 2 3 1 Z A 2 3 A 1 | 2 | 3 A   Z Z  1 | 2 | n From Engineering a Compiler by Keith D. Cooper and Linda Torczon

  27. Graphically becomes … Left Factoring No basis for choice Word determines correct choice Identifier Factor Identifier [ ExprList ] Identifier ( ExprList )  Factor Identifier [ ExprList ] ( ExprList ) From Engineering a Compiler by Keith D. Cooper and Linda Torczon

  28. Eliminating Left Recursion: Expr Grammar • General approach for immediate left recursion • Replace A  Aα | β • with A  βA’ and A’  αA’ | ε • So for the expression grammar • E  E + T | E – T | T • We rewrite the E productions as • E  T E’ • E’  + T E’ | ε

  29. Replace T  T * F | F with T  F T’ T’  * F T’ | ε No replacing needed for the F productions, so the grammar becomes: E  T E’ E’  + T E’ | - T E’ | ε T  F T’ T’  * F T’ | / F T’ | ε F  id | num | ( E ) Eliminating Left Recursion: Expr Grammar

  30. Eliminating Immediate Left Recursion • In general consider all the A productions • A  Aα1 | Aα2 | … | Aαn | β1 | β2 | … | βm • Replace them with • A  β1A’ | β2A’ | … | βmA’ • A’  α1A’ | α2A’ | … | αnA’ | ε • But not all left recursion is immediate. Consider • S  Aa | Bb |c Then SAaCaaScaa • A  Ca | aA | a A * Aβ • C  Sc • B  b B | b

  31. Eliminating Left Recursion Algorithm • Algorithm 4.1 Eliminating Left Recursion • Input: Grammar with no cycles or ε-productions • Output: Equivalent Grammar with no left recursion • Arrange the nonterminals in order A1, A2, … Ann • for i = 1 to n do • for J = 1 to i-1 do • replace each production of the form Ai AJξδ by the productions Ai  δ1ξ | δ2ξ | … | δkξ where AJ  δ1 | δ2 | … | δk the current Ai-productions • end • Eliminate immediate left recursion in the Ai-productions • end

  32. Eliminating Left Recursion How does this algorithm work? 1. Impose arbitrary order on the non-terminals 2. Outer loop cycles through Nonterminals in some order 3. Inner loop ensures that a production expanding Aihas no non-terminal AJ in its rhs, for J < i 4. Last step in outer loop converts any direct recursion on Ai to right recursion using the transformation showed earlier 5. New non-terminals are added at the end of the order & have no left recursion At the start of the ith outer loop iteration For all k < i, no production that expands Ak contains a non-terminal As in its rhs, for s < k

  33. Example • Order of symbols: G, E, T G  E EE + T ET TE ~ T Tid From Engineering a Compiler by Keith D. Cooper and Linda Torczon

  34. Example • Order of symbols: G, E, T 1. Ai = G G  E EE + T ET TE ~ T Tid From Engineering a Compiler by Keith D. Cooper and Linda Torczon

  35. Example • Order of symbols: G, E, T 1. Ai = G G  E EE + T ET TE ~ T Tid 2. Ai = E G  E ET E' E' + T E' E'  e T E ~ T Tid From Engineering a Compiler by Keith D. Cooper and Linda Torczon

  36. Example • Order of symbols: G, E, T 1. Ai = G G  E EE + T ET TE ~ T Tid 2. Ai = E G  E ET E' E' + T E' E'  e T E ~ T Tid 3. Ai = T, As = E G  E ET E' E' + T E' E'  e T T E' ~ T Tid Go to Algorithm From Engineering a Compiler by Keith D. Cooper and Linda Torczon

  37. Example • Order of symbols: G, E, T 1. Ai = G G  E EE + T ET TE ~ T Tid 2. Ai = E G  E ET E' E' + T E' E'  e T E ~ T Tid 3. Ai = T, As = E G  E ET E' E' + T E' E'  e T T E' ~ T Tid 4. Ai = T G  E ET E' E' + T E' E'  e Tid T' T'E' ~ T T' T'  e From Engineering a Compiler by Keith D. Cooper and Linda Torczon

  38. Predictive Parsing Basic idea Given A    , the parser should be able to choose between & FIRST sets For some rhsG, define FIRST() as the set of tokens that appear as the first symbol in some string that derives from  That is, x FIRST() iff*x, for some  If A   and A   both appear in the grammar, and FIRST()  FIRST() =  This would appear to allow the parser to make a correct choice with a lookahead of exactly one symbol ! (if there are no e-productions then it does.)

More Related