1 / 157

Compilation 0368 -3133

Learn about context-free languages (CFLs), context-free grammars (CFGs), and top-down parsing in compiler design. Understand how to construct parse trees and check the validity of an input.

Download Presentation

Compilation 0368 -3133

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compilation 0368-3133 Lecture 3: Syntax Analysis: CFLs, PDAs, Top-Down parsing Noam Rinetzky

  2. Source text txt Executable code exe The Real Anatomy of a Compiler Process text input LexicalAnalysis SyntaxAnalysis Sem.Analysis LexicalAnalysis SyntaxAnalysis tokens characters AST Annotated AST Codegeneration Intermediate code optimization Intermediate code generation IR IR Symbolic Instructions Target code optimization Machine code generation Write executable output SI MI

  3. Syntax Analysis (1) Context Free Languages Context Free Grammars Pushdown Automata

  4. Op(*) Op(+) Id(b) Num(23) Num(7) Frontend: Scanning & Parsing program text ((23 + 7) * x) Lexical Analyzer token stream Grammar: E ... | Id Id  ‘a’ | ... | ‘z’ Parser valid syntaxerror Abstract Syntax Tree

  5. Op(*) Op(+) Id(b) Num(23) Num(7) From scanning to parsing program text ((23 + 7) * x) Lexical Analyzer token stream Grammar: E ... | Id Id  ‘a’ | ... | ‘z’ Parser valid syntaxerror Abstract Syntax Tree

  6. Parsing • Construct a structured representation of the input • Challenges • How do you describe the programming language? • How do you check validity of an input? • Is a sequence of tokens a valid program in the language? • How do you construct the structured representation? • Where do you report an error?

  7. Some foundations

  8. Context free languages (CFLs) • L01 = { 0n1n | n > 0 } • Lpolyndrom = {pp’ | p ∊ Σ* , p’=reverse(p)} • Lpolyndrom#= {p#p’ | p ∊ Σ* , p’=reverse(p), # ∉ Σ}

  9. Context free grammars (CFG) G = (V,T,P,S) • V – non terminals (syntactic variables) • T – terminals (tokens) • P – derivation rules • Each rule of the form V (T V)* • S – start symbol

  10. What can CFGs do? • Recognize CFLs • S0T1 • T0T1 | ℇ

  11. Recognizing CFLs • Context Free Grammars (CFG) • Nondeterministic push down automata (PDA) • Deterministic push down automata • language-defining power ~ ~

  12. Broad kinds of parsers • Parsers for arbitrarygrammars • Earley’s method, CYK method • Usually, not used in practice (though might change) • Top-Down parsers • Construct parse tree in a top-down matter • Find the leftmost derivation • Bottom-Up parsers • Construct parse tree in a bottom-up manner • Find the rightmost derivation in a reverse order

  13. The plan • Context free grammars • Top down parsing: Intuition • Brute force parsing • Predictive parsing • Push-down Automata • Earley parsing

  14. Context free grammars (CFGs)

  15. Context free grammars (CFGs) G = (V,T,P,S) • V– non terminals (syntactic variables) • T – terminals (tokens) • P – derivation rules • Each rule of the form V(T ∪V)* • S – start symbol

  16. Example grammar S shorthand for Statement S  S;S S id :=E E id E num E  E+E E shorthand for Expression

  17. CFG terminology S  S;S S id :=E E id E num E  E+E Symbols:Terminals(tokens): ; := id numNon-terminals: S E L Start non-terminal: SConvention: the non-terminal appearingin the first derivation rule Grammar productions (rules)N  μ

  18. CFG terminology • Derivation - a sequence of replacements of non-terminals using the derivation rules • Language - the set of strings of terminals derivable from the start symbol • Sentential form - the result of a partial derivation • May contain non-terminals

  19. Derivations • Show that a sentence ω is in a grammar G • Start with the start symbol • Repeatedly replace one of the non-terminals by a right-hand side of a production • Stop when the sentence contains only terminals • Given a sentence αNβ and rule NµαNβ => αµβ • ω is in L(G) if S =>* ω

  20. Derivation sentence grammar x := z; y := x + z S S;S S id := E | … E id | E + E | E * E |… S S ; S id := E S ; S S;S id := E id := E ; S id := E id := E ; id := id S id := E id := id id := E + E ; Eid id := id id := E + id ; E E + E id := id id := id + id ; E id E id <id,”x”><ASS><id,”z”> <SEMI> <id,”y”><ASS><id,”x”><PLUS><id,”z”>

  21. Parse trees: Traces of Derivations • Tree nodes are symbols, children ordered left-to-right • Each internalnode is non-terminal • Children correspond to one of its productions N  µ1 … µk • Root is start non-terminal • Leaves are tokens • Yieldof parse tree: left-to-right walk over leaves N µ1 µk …

  22. Parse Tree S S S S ; S S E id := E id := E S ; id := id S ; ; id := id id := E E E id := id id := E + E ; x:= z ; y := x + z id := id id := E + id ; ; id id id := id id := id + id id := id ; +

  23. Language of a CFG • A sentence ω is in L(G) (valid program) if • There exists a corresponding derivation • There exists a corresponding parse tree

  24. Questions • How did we know which rule to apply on every step? • Would we always get the same result? • Does it matter?

  25. Ambiguity x := y+z*w S S ; S S id := E | … E id | E + E | E * E | … S S id := E id := E * E E + E E id id E * E E + E id id id id

  26. Leftmost/rightmost Derivation • Leftmost derivation • always expand leftmost non-terminal • Rightmost derivation • Always expand rightmost non-terminal • Allows us to describe derivation by listing the sequence of rules • always know what a rule is applied to

  27. Leftmost Derivation x := z; y := x + z S S;S S id := E E id | E + E | E * E | (E) S S ; S id := E S ; S S;S ; id := id S S id := E ; id := id id := E E id id := E + E id := id ; S id := E id := id ; id := id + E E E + E ; id := id id := id + id E id E id ; x := z y := x + z

  28. Rightmost Derivation <id,”x”> ASS <id,”z”> ; <id,”y”> ASS <id,”x”> PLUS <id,”z”> S S;S S id := E | … E id | E + E | E * E | … S S ; S S id := E ; S S;S S id := E + E ; S id := E S id := E + id ; E E + E S id := id + id ; E id id := id + id id := E ; E id id := id id := id + id ; S id := E E id <id,”x”> ASS <id,”z”> ; <id,”y”> ASS <id,”x”> PLUS <id,”z”>

  29. Sometimes there are two parse trees 1 + 2 + 3 Arithmetic expressions:E id E num E  E+EE  E*EE  (E ) “1 + (2 + 3)” “(1 + 2) + 3” E E E E E E E E E E num(1) + num(2) + num(3) num(1) + num(2) + num(3) Leftmost derivationEE + Enum + Enum + E +Enum + num +E num + num + num Rightmost derivationEE +EE +numE + E + numE + num + numnum + num + num

  30. Is ambiguity a problem? 1 + 2 + 3 Arithmetic expressions:E id E num E  E+EE  E*EE  (E ) “1 + (2 + 3)” “(1 + 2) + 3” E E E E E E E E E E num(1) + num(2) + num(3) = 6 num(1) + num(2) + num(3) = 6 Leftmost derivationEE + Enum + Enum + E +Enum + num +E num + num + num Rightmost derivationEE +EE +numE + E + numE + num + numnum + num + num

  31. Problematic ambiguity example 1 + 2 * 3 Arithmetic expressions:E id E num E  E+EE  E*EE  (E ) “1 + (2 * 3)” “(1 + 2) * 3” This is what we usually want: * has precedence over + E E E E E E E E E E num(1) + num(2) * num(3) = 7 num(1) + num(2) * num(3) = 9 Leftmost derivationEE + Enum + Enum + E * Enum + num * E num + num * num Rightmost derivationEE * EE *numE + E * numE + num * numnum + num * num

  32. Ambiguous grammars • A grammar is ambiguousif there exists a sentence for which there are • Two different leftmost derivations • Two different rightmost derivations • Two different parse trees • Property of grammars, not languages • Some languages are inherently ambiguous • No unambiguous grammars exist • No algorithm to detect whether arbitrary grammar is ambiguous

  33. Drawbacks of ambiguous grammars • Ambiguous semantics • 1 + 2 * 3 = 7 or 9 • Parsing complexity • May affect other phases • Solutions • Transform grammar into non-ambiguous • Handle as part of parsing method • Using special form of “precedence”

  34. Transforming ambiguous grammars to non-ambiguous by layering Unambiguous grammar E  E + T E  T T  T * F T  F F  id F  num F  ( E ) Ambiguous grammarE  E + EE  E * EE  id E  num E  ( E ) Each layer takes care of one way of composing sub-strings to form a string:1: by +2: by *3: atoms Layer 1 Layer 2 Layer 3

  35. Transformed grammar: * precedes + Unambiguous grammar E  E + T E  T T  T * F T  F F  id F  num F  ( E ) Ambiguous grammarE  E + EE  E * EE  id E  num E  ( E ) Parse tree E Derivation E=> E + T=> T + T=> F + T=> 1 + T=> 1 + T * F=> 1 + F * F=> 1 + 2 * F=> 1 + 2 * 3 E T T T Let’s derive 1 + 2 * 3 F F F 1 + 2 * 3

  36. Transformed grammar: + precedes * Ambiguous grammarE  E + EE  E * EE  id E  num E  ( E ) Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) Parse tree E Derivation E=> E * T=> T * T=> T + F * T=> F + F * T=> 1 + F * T=> 1 + 2 * T=> 1 + 2 * F=> 1 + 2 * 3 E T Let’s derive 1 + 2 * 3 T T F F F 1 + 2 * 3

  37. Another example for layering Ambiguous grammarP ε | P P | ( P ) P P P P P P P P P P P ( ( ε ) ( ε ) ) ( ( ε ) ( ε ) ) ε

  38. Another example for layering Unambiguous grammar S P S | ε P  ( S ) Ambiguous grammarP ε | P P | ( P ) Takes care of “concatenation” Takes care of nesting s P s s s P P s S S ( ( ε ) ( ε ) ε ) ε

  39. “dangling-else” example p. 174 Ambiguous grammar S ifEthenS S |ifEthenS else S | other Unambiguousgrammar ? This is what we usually want: match else to closest unmatched then ifE1then ifE2thenS1else S2 ifE1then (ifE2thenS1else S2) ifE1then (ifE2thenS1)else S2 S S if E then S else S if E then S E1 E1 S2 if E then S else S if E then S E2 S1 S2 E2 S1

  40. Broad kinds of parsers • Parsers for arbitrarygrammars • Earley’s method, CYK method • Usually, not used in practice (though might change) • Top-down parsers • Construct parse tree in a top-down matter • Find the leftmost derivation • Bottom-up parsers • Construct parse tree in a bottom-up manner • Find the rightmost derivation in a reverse order

  41. Intuition: Top-down parsing • Begin with start symbol • Guess the productions • Check if parse tree yields user's program

  42. Intuition: Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) Recall: Non standard precedence … E E T T T F F F 1 + 2 * 3

  43. Intuition: Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) We need this rule to get the * E 1 + 2 * 3

  44. Intuition: Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E E T 1 + 2 * 3

  45. Intuition: Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E E T T F 1 + 2 * 3

  46. Intuition: Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E E T T T F F 1 + 2 * 3

  47. Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E E T T T F F F 1 + 2 * 3

  48. Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) Recall: Non standard precedence … E E T T T F F F 1 + 2 * 3

  49. Top-down parsing

More Related