Lecture 3: Parsing - PowerPoint PPT Presentation

lecture 3 parsing n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lecture 3: Parsing PowerPoint Presentation
Download Presentation
Lecture 3: Parsing

play fullscreen
1 / 63
Download Presentation
Lecture 3: Parsing
93 Views
akamu
Download Presentation

Lecture 3: Parsing

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Lecture 3: Parsing CS 540 George Mason University

  2. Static Analysis - Parsing Syntatic/semantic structure Syntatic structure tokens Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Code Generator Source language Target language Code Optimizer • Syntax described formally • Tokens organized into syntax tree that describes structure • Error checking Symbol Table CS 540 Spring 2009 GMU

  3. Static Analysis - Parsing We can use context free grammars to specify the syntax of programming languages. CS 540 Spring 2009 GMU

  4. Context Free Grammars Definition: A context free grammar is a formal model that consists of: • A set of tokens or terminal symbols Vt • A set of nonterminal symbols Vn • Start symbol S (in Vn) • Finite set of productions of form: A  R1…Rn (n >= 0) where A in Vn, R in Vn Vt CS 540 Spring 2009 GMU

  5. CFG Examples Indicates a production Vt = {+,-,0..9}, Vn = {L,D}, s = {L} L  L + D | L – D | D D  0 | … | 9 Vt={(,)}, Vn = {L}, s = {L} L  ( L ) L L  e Shorthand for multiple productions CS 540 Spring 2009 GMU

  6. Languages CS 540 Spring 2009 GMU

  7. Any regular language can be expressed using a CFG Starting with a NFA: • For each state Si in the NFA • Create non-terminal Ai • If transition (Si,a) = Sk, create production Ai a Ak • If transition (Si,e) = Sk, create production Ai Ak • If Si is a final state, create production Ai e • If Si is the NFA start state, s = Ai • What does the existence of this algorithm tell us about the relationship between regular and context free languages? CS 540 Spring 2009 GMU

  8. NFA to CFG Example ab*a a a 1 2 3 b A1 a A2 A2  b A2 A2  a A3 A3  e CS 540 Spring 2009 GMU

  9. Writing Grammars When writing a grammar (or RE) for some language, the following must be true: • All strings generated are in the language. • Your grammar produces all strings in the language. CS 540 Spring 2009 GMU

  10. Try these: • Integers divisible by 2 • Legal postfix expressions • Floating point numbers with no extra zeros • Strings of 0,1 where there are more 0 than 1 CS 540 Spring 2009 GMU

  11. Parsing • The task of parsing is figuring out what the parse tree looks like for a given input and language. • If a string is in the given language, a parse tree must exist. • However, just because a parse tree exists for some string in a given language doesn’t mean a given algorithm can find it. CS 540 Spring 2009 GMU

  12. Parse Trees The parse tree for some string in a language that is defined by the grammar G as follows: • The root is the start symbol of G • The leaves are terminals or e. When visited from left to right, the leaves form the input string • The interior nodes are non-terminals of G • For every non-terminal A in the tree with children B1 … Bk, there is some production A  B1… Bk CS 540 Spring 2009 GMU

  13. Parse Tree for (())() L  ( L ) L L  e L L L ) ( ) ( L ) L ( L L e e e e CS 540 Spring 2009 GMU

  14. Parse Tree for (())() L  ( L ) L L  e L L L ) ( ) ( L ) L ( L L e e e e CS 540 Spring 2009 GMU

  15. Parsing • Parsing algorithms are based on the idea of derivations. • General Algorithms: • LL (top down) • LR (bottom up) CS 540 Spring 2009 GMU

  16. Single Step Derivation Definition: Given a A b (with a,b in (Vn Vt)*) and a production A  g, a A b a gb is a single step derivation. Examples: L + D L – D + D L  L - D ( L ) ( L )  ( ( L ) L ) ( L ) L  (L) L • Greek letters (a,b,c,…) • denote a (possibly empty) • sequence of terminals and • non-terminals. CS 540 Spring 2009 GMU

  17. Derivations Definition: A sequence of the form: w0 w1 …  wn is a derivation of wn from w0(w0* wn) L production L  ( L ) L ( L ) L production L  e ( ) L production L  e ( ) L * () If wi has non-terminal symbols, it is referred to as sentential form. CS 540 Spring 2009 GMU

  18. L * (( )) ( ) CS 540 Spring 2009 GMU

  19. L(G), the language generated by grammar G is {w in Vt*: s * w, for start symbol s} • We’ve just shown that both () and (())() are in L(G) for the previous grammar. CS 540 Spring 2009 GMU

  20. Leftmost Derivations • A derivation where the leftmost nonterminal is always chosen • If a string is in a given language (i.e. a derivation exists), then a leftmost derivation must exist • Rightmost derivation defined as you would expect CS 540 Spring 2009 GMU

  21. Leftmost Derivation for (())() CS 540 Spring 2009 GMU

  22. Rightmost Derivation for (())() CS 540 Spring 2009 GMU

  23. Ambiguity Two (or more) parse trees or leftmost derivations for some string in the language E  E + E E  E – E E  0 | … | 9 E E E + E E - E 4 2 E + E E - E 2 – 3 + 4 3 4 2 3 CS 540 Spring 2009 GMU

  24. Two leftmost derivations EE + E E E – E E – E + E  2 – E  2 – E + E  2 – E + E  2 – 3 + E 2 – 3 + E  2 – 3 + 4  2 – 3 + 4 CS 540 Spring 2009 GMU

  25. An ambiguous grammar can sometimes be made unambiguous: E  E + T | E – T | T T  0 | … | 9 • Precedence can be specified as well: E  E + T | E – T | T T  T * F | T / F | F F  ( E ) | 0 | …| 9 enforces the correct associativity CS 540 Spring 2009 GMU

  26. Input: begin simplestmt; simplestmt; end P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  27. Top Down (LL) Parsing P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  28. Top Down (LL) Parsing P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  29. Top Down (LL) Parsing P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  30. Top Down (LL) Parsing P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS SS S S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  31. Top Down (LL) Parsing P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS SS S S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  32. Top Down (LL) Parsing 1 P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end 2 SS 4 SS 3 SS 6 S 5 S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  33. Bottomup (LR) Parsing P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  34. Bottomup (LR) Parsing P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end S S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  35. Bottomup (LR) Parsing P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  36. Bottomup (LR) Parsing P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  37. Bottomup (LR) Parsing P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  38. Bottomup (LR) Parsing 6 P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end 5 SS 4 SS 1 SS 3 S 2 S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

  39. Parsing • General Algorithms: • LL (top down) • LR (bottom up) • Both algorithms are driven by the input grammar and the input to be parsed • Two important sets: FIRST and FOLLOW CS 540 Spring 2009 GMU

  40. FIRST Sets FIRST(a) is the set of all terminal symbols that can begin some sentential form in a derivation that starts with a a …  a b • FIRST(a) = {a in Vt | a * ab }  { e } if a*e • Example: S  simple | begin S end FIRST(S) = {simple, begin} CS 540 Spring 2009 GMU

  41. To compute FIRST across strings of terminals and non-terminals: FIRST(e) = { e } FIRST(Aa) = A if A is a terminal FIRST(A)  FIRST(a) if A *e FIRST(A) otherwise CS 540 Spring 2009 GMU

  42. Computing FIRST sets Initially FIRST(A) (for all A) is empty • For productions A  c b, where c in Vt Add { c } to FIRST(A) • For productions A  e Add { e } to FIRST(A) • For productions A  a B b, where a*e and NOT (B *e) Add FIRST(aB) to FIRST(A) • For productions A  a, where a*e Add FIRST(a) and { e } to FIRST(A) same thing as e FIRST(B) CS 540 Spring 2009 GMU

  43. S  a S e S  B B  b B e B  C C  c C e C  d FIRST(C) = {c,d} FIRST(B) = FIRST(S) = Example 1 Start with the ‘simplest’ non-terminal CS 540 Spring 2009 GMU

  44. S  a S e S  B B  b B e B  C C  c C e C  d FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = Example 1 Now that we know FIRST(C) … CS 540 Spring 2009 GMU

  45. S  a S e S  B B  b B e B  C C  c C e C  d FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = {a,b,c,d} Example 1 CS 540 Spring 2009 GMU

  46. P  i | c | n T S Q  P | a S | d S c S T R  b |e S  e | R n | e T  R S q FIRST(P) = {i,c,n} FIRST(Q) = FIRST(R) = {b,e} FIRST(S) = FIRST(T) = Example 2 CS 540 Spring 2009 GMU

  47. P  i | c | n T S Q  P | a S | d S c S T R  b | e S  e | R n | e T  R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = FIRST(T) = Example 2 CS 540 Spring 2009 GMU

  48. P  i | c | n T S Q  P | a S | d S c S T R  b | e S  e | R n | e T  R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = {e,b,n,e} FIRST(T) = Example 2 Note: S  R n  n because R * e CS 540 Spring 2009 GMU

  49. P  i | c | n T S Q  P | a S | d S c S T R  b | e S  e | R n | e T  R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = {e,b,n,e} FIRST(T) = {b,c,n,q} Example 2 Note: T  R S q  S q q because both R and S * e CS 540 Spring 2009 GMU

  50. S  a S e | S T S T  R S e | Q R  r S r | e Q  S T | e FIRST(S) = FIRST(R) = FIRST(T) = FIRST(Q) = Example 3 CS 540 Spring 2009 GMU