# Lecture 3: Parsing - PowerPoint PPT Presentation

Download Presentation
Lecture 3: Parsing

1 / 63
Lecture 3: Parsing
Download Presentation

## Lecture 3: Parsing

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Lecture 3: Parsing CS 540 George Mason University

2. Static Analysis - Parsing Syntatic/semantic structure Syntatic structure tokens Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Code Generator Source language Target language Code Optimizer • Syntax described formally • Tokens organized into syntax tree that describes structure • Error checking Symbol Table CS 540 Spring 2009 GMU

3. Static Analysis - Parsing We can use context free grammars to specify the syntax of programming languages. CS 540 Spring 2009 GMU

4. Context Free Grammars Definition: A context free grammar is a formal model that consists of: • A set of tokens or terminal symbols Vt • A set of nonterminal symbols Vn • Start symbol S (in Vn) • Finite set of productions of form: A  R1…Rn (n >= 0) where A in Vn, R in Vn Vt CS 540 Spring 2009 GMU

5. CFG Examples Indicates a production Vt = {+,-,0..9}, Vn = {L,D}, s = {L} L  L + D | L – D | D D  0 | … | 9 Vt={(,)}, Vn = {L}, s = {L} L  ( L ) L L  e Shorthand for multiple productions CS 540 Spring 2009 GMU

6. Languages CS 540 Spring 2009 GMU

7. Any regular language can be expressed using a CFG Starting with a NFA: • For each state Si in the NFA • Create non-terminal Ai • If transition (Si,a) = Sk, create production Ai a Ak • If transition (Si,e) = Sk, create production Ai Ak • If Si is a final state, create production Ai e • If Si is the NFA start state, s = Ai • What does the existence of this algorithm tell us about the relationship between regular and context free languages? CS 540 Spring 2009 GMU

8. NFA to CFG Example ab*a a a 1 2 3 b A1 a A2 A2  b A2 A2  a A3 A3  e CS 540 Spring 2009 GMU

9. Writing Grammars When writing a grammar (or RE) for some language, the following must be true: • All strings generated are in the language. • Your grammar produces all strings in the language. CS 540 Spring 2009 GMU

10. Try these: • Integers divisible by 2 • Legal postfix expressions • Floating point numbers with no extra zeros • Strings of 0,1 where there are more 0 than 1 CS 540 Spring 2009 GMU

11. Parsing • The task of parsing is figuring out what the parse tree looks like for a given input and language. • If a string is in the given language, a parse tree must exist. • However, just because a parse tree exists for some string in a given language doesn’t mean a given algorithm can find it. CS 540 Spring 2009 GMU

12. Parse Trees The parse tree for some string in a language that is defined by the grammar G as follows: • The root is the start symbol of G • The leaves are terminals or e. When visited from left to right, the leaves form the input string • The interior nodes are non-terminals of G • For every non-terminal A in the tree with children B1 … Bk, there is some production A  B1… Bk CS 540 Spring 2009 GMU

13. Parse Tree for (())() L  ( L ) L L  e L L L ) ( ) ( L ) L ( L L e e e e CS 540 Spring 2009 GMU

14. Parse Tree for (())() L  ( L ) L L  e L L L ) ( ) ( L ) L ( L L e e e e CS 540 Spring 2009 GMU

15. Parsing • Parsing algorithms are based on the idea of derivations. • General Algorithms: • LL (top down) • LR (bottom up) CS 540 Spring 2009 GMU

16. Single Step Derivation Definition: Given a A b (with a,b in (Vn Vt)*) and a production A  g, a A b a gb is a single step derivation. Examples: L + D L – D + D L  L - D ( L ) ( L )  ( ( L ) L ) ( L ) L  (L) L • Greek letters (a,b,c,…) • denote a (possibly empty) • sequence of terminals and • non-terminals. CS 540 Spring 2009 GMU

17. Derivations Definition: A sequence of the form: w0 w1 …  wn is a derivation of wn from w0(w0* wn) L production L  ( L ) L ( L ) L production L  e ( ) L production L  e ( ) L * () If wi has non-terminal symbols, it is referred to as sentential form. CS 540 Spring 2009 GMU

18. L * (( )) ( ) CS 540 Spring 2009 GMU

19. L(G), the language generated by grammar G is {w in Vt*: s * w, for start symbol s} • We’ve just shown that both () and (())() are in L(G) for the previous grammar. CS 540 Spring 2009 GMU

20. Leftmost Derivations • A derivation where the leftmost nonterminal is always chosen • If a string is in a given language (i.e. a derivation exists), then a leftmost derivation must exist • Rightmost derivation defined as you would expect CS 540 Spring 2009 GMU

21. Leftmost Derivation for (())() CS 540 Spring 2009 GMU

22. Rightmost Derivation for (())() CS 540 Spring 2009 GMU

23. Ambiguity Two (or more) parse trees or leftmost derivations for some string in the language E  E + E E  E – E E  0 | … | 9 E E E + E E - E 4 2 E + E E - E 2 – 3 + 4 3 4 2 3 CS 540 Spring 2009 GMU

24. Two leftmost derivations EE + E E E – E E – E + E  2 – E  2 – E + E  2 – E + E  2 – 3 + E 2 – 3 + E  2 – 3 + 4  2 – 3 + 4 CS 540 Spring 2009 GMU

25. An ambiguous grammar can sometimes be made unambiguous: E  E + T | E – T | T T  0 | … | 9 • Precedence can be specified as well: E  E + T | E – T | T T  T * F | T / F | F F  ( E ) | 0 | …| 9 enforces the correct associativity CS 540 Spring 2009 GMU

26. Input: begin simplestmt; simplestmt; end P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

27. Top Down (LL) Parsing P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

28. Top Down (LL) Parsing P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

29. Top Down (LL) Parsing P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

30. Top Down (LL) Parsing P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS SS S S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

31. Top Down (LL) Parsing P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS SS S S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

32. Top Down (LL) Parsing 1 P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end 2 SS 4 SS 3 SS 6 S 5 S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

33. Bottomup (LR) Parsing P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

34. Bottomup (LR) Parsing P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end S S begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

35. Bottomup (LR) Parsing P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

36. Bottomup (LR) Parsing P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

37. Bottomup (LR) Parsing P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end SS SS SS S S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

38. Bottomup (LR) Parsing 6 P P begin SS end SS  S ; SS SS  e S  simplestmt S  begin SS end 5 SS 4 SS 1 SS 3 S 2 S e begin simplestmt ; simplestmt ; end CS 540 Spring 2009 GMU

39. Parsing • General Algorithms: • LL (top down) • LR (bottom up) • Both algorithms are driven by the input grammar and the input to be parsed • Two important sets: FIRST and FOLLOW CS 540 Spring 2009 GMU

40. FIRST Sets FIRST(a) is the set of all terminal symbols that can begin some sentential form in a derivation that starts with a a …  a b • FIRST(a) = {a in Vt | a * ab }  { e } if a*e • Example: S  simple | begin S end FIRST(S) = {simple, begin} CS 540 Spring 2009 GMU

41. To compute FIRST across strings of terminals and non-terminals: FIRST(e) = { e } FIRST(Aa) = A if A is a terminal FIRST(A)  FIRST(a) if A *e FIRST(A) otherwise CS 540 Spring 2009 GMU

42. Computing FIRST sets Initially FIRST(A) (for all A) is empty • For productions A  c b, where c in Vt Add { c } to FIRST(A) • For productions A  e Add { e } to FIRST(A) • For productions A  a B b, where a*e and NOT (B *e) Add FIRST(aB) to FIRST(A) • For productions A  a, where a*e Add FIRST(a) and { e } to FIRST(A) same thing as e FIRST(B) CS 540 Spring 2009 GMU

43. S  a S e S  B B  b B e B  C C  c C e C  d FIRST(C) = {c,d} FIRST(B) = FIRST(S) = Example 1 Start with the ‘simplest’ non-terminal CS 540 Spring 2009 GMU

44. S  a S e S  B B  b B e B  C C  c C e C  d FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = Example 1 Now that we know FIRST(C) … CS 540 Spring 2009 GMU

45. S  a S e S  B B  b B e B  C C  c C e C  d FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = {a,b,c,d} Example 1 CS 540 Spring 2009 GMU

46. P  i | c | n T S Q  P | a S | d S c S T R  b |e S  e | R n | e T  R S q FIRST(P) = {i,c,n} FIRST(Q) = FIRST(R) = {b,e} FIRST(S) = FIRST(T) = Example 2 CS 540 Spring 2009 GMU

47. P  i | c | n T S Q  P | a S | d S c S T R  b | e S  e | R n | e T  R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = FIRST(T) = Example 2 CS 540 Spring 2009 GMU

48. P  i | c | n T S Q  P | a S | d S c S T R  b | e S  e | R n | e T  R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = {e,b,n,e} FIRST(T) = Example 2 Note: S  R n  n because R * e CS 540 Spring 2009 GMU

49. P  i | c | n T S Q  P | a S | d S c S T R  b | e S  e | R n | e T  R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,e} FIRST(S) = {e,b,n,e} FIRST(T) = {b,c,n,q} Example 2 Note: T  R S q  S q q because both R and S * e CS 540 Spring 2009 GMU

50. S  a S e | S T S T  R S e | Q R  r S r | e Q  S T | e FIRST(S) = FIRST(R) = FIRST(T) = FIRST(Q) = Example 3 CS 540 Spring 2009 GMU