1 / 40

Parsing Computer Programs and LR(0) Grammars

Learn about parsing algorithms, LR(0) grammars, and the process of parsing computer programs using these techniques. Discover how to optimize parsing speed and understand the concept of valid items and parse tree construction.

taniar
Download Presentation

Parsing Computer Programs and LR(0) Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fall 2011 The Chinese University of Hong Kong CSCI 3130: Automata theory and formal languages LR(0) grammars Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

  2. Parsing computer programs • First the javac compiler does a lexical analysis: if (n == 0) { return x; } if (ID == INT_LIT) { return ID; } ID = identifier (name of variable, procedure, class, ...) INT_LIT = integer literal (value) • The alphabet of java CFG consists of symbols like: S = {if, return, (, ){, }, ;, ==, ID, INT_LIT, ...}

  3. Parsing computer programs if (n == 0) { return x; } Statement if (ID == INT_LIT) { return ID; } ifParExpressionStatement (Expression) Block Expression ExpressionRest { BlockStatements } BlockStatement Primary Infixop Expression Statement Identifier == Primary return Expression ; ID Literal Primary INT_LIT Identifier the parse tree of a java statement ID

  4. CFG of the java programming language Identifier: ID QualifiedIdentifier: Identifier { . Identifier } Literal: IntegerLiteral FloatingPointLiteral CharacterLiteral StringLiteral BooleanLiteral NullLiteral Expression: Expression1 [AssignmentOperator Expression1]] AssignmentOperator: = += -= *= /= &= |= … from http://java.sun.com/docs/books/jls /second_edition/html/syntax.doc.html#52996

  5. Parsing java programs class Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug; // A trick to help with debugging public Point2d (double px, double py) { // Constructor x = px; y = py; debug = false; // turn off debugging } public Point2d () { // Default constructor this (0.0, 0.0); // Invokes 2 parameter Point2D constructor } // Note that a this() invocation must be the BEGINNING of // statement body of constructor public Point2d (Point2d pt) { // Another consructor x = pt.getX(); y = pt.getY(); } } … Simple java program: about 500 symbols

  6. Parsing algorithms • How long would it take to parse this program? • Can we parse faster? • No! CYK is the fastest known general-purposeparsing algorithm for CFGs try all parse trees about 1080 years CYK algorithm about 1 week!

  7. Another way of thinking Scientist: Find an algorithm thatcan parse any CFG Engineer: Design your CFG so it can be parsedvery quickly

  8. Parsing left to right S  Tc(1) T  TA(2) | A(3) A  aTb(4) | ab(5) input: abaabbc S T A T a a b b c a b • • • • • • • • T A A Try to match to the left of •

  9. Items • An item is a production augmented with a • • The item is complete if the • is the last symbol A  aTb(4) S  Tc(1) T  TA(2) T  A(3) A  ab(5) A  •aTb A  a•Tb A  aT•b A  aTb• A  •ab A  a•b A  ab• T  •A T  A• S  •Tc S T•c S  Tc• T  •TA T  T•A T  TA•

  10. Meaning of items S  Tc(1) T  TA(2) | A(3) A  aTb(4) | ab(5) A  a•Tb A  a•b a a b b c a b • A  aT•b T a a b b c a b • T A A Items represent possibilities at various stages of the parsing process

  11. Meaning of items S  Tc(1) T  TA(2) | A(3) A  aTb(4) | ab(5) A  a•Ab A  a•b a a b b c a b • A  aT•b A  aTb• A T T a a b b c a b • a a b b c a b • T T A A A A When a complete item occurs, a part of the parse tree is discovered

  12. LR(0) parsing Move from left to right A  aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  •ab A  •aAb a a b b •

  13. LR(0) parsing Move from left to right A  aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  a•b A  a•Ab a a b b • A  •ab A  •aAb

  14. LR(0) parsing Move from left to right A  aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree A  a•Ab valid items registry A  a•b A  a•Ab a a b b • A  a•b A  a•Ab

  15. LR(0) parsing Move from left to right A  aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree A  a•Ab valid items registry A  a•b A  a•Ab a a b b • A  •ab A  •aAb

  16. LR(0) parsing Move from left to right A  aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree A  a•Ab A  a•Ab valid items registry A  ab• A  a•Ab a a b b • A  •ab A  •aAb

  17. LR(0) parsing Move from left to right A  aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree A A  a•Ab valid items registry A  aA•b A  ab• a a b b •

  18. LR(0) parsing Move from left to right A  aAb | ab Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree A A valid items registry A  aAb• a a b b •

  19. two kinds of actions A  a•b A  ab• A  a•Ab A  •aAb A  •ab a a b b • • A no complete itemin registry exactly one complete item valid items registry valid items registry a a b b • shift reduce

  20. LR(0) implementation: first take action valid items stack A  aAb | ab S  A  •aAb A  •ab A  a•Ab A  a•b A  •aAb A  •ab a S a a b b A A  a•Ab A  a•b A  •aAb A  •ab S aa aab R A  ab• A S aA A  aA•b R aAb A  aAb• • • • • • A

  21. valid item update rules a, b: terminals A, B, C: variables a, b, d: mixed strings notation initial valid items: S  •a read a A  aa•b A  a•ab shift updates: read b disappear A  a•ab read x disappear A  a•Bb A  aB•b A  a•Bb reduce updates: reduce B disappear A  a•Bb reduce C common updates: B  •d A  a•Bb

  22. valid item update rules a, b: terminals A, B, C: variables a, b, d: mixed strings X: terminal or variable notation initial valid items: S  •a read a A  aa•b A  a•ab shift updates: A  aB•b A  aX•b A  a•Bb A  a•Xb reduce updates: reduce B X common updates: B  •d A  a•Bb

  23. LR(0) parsing: NFA representation a, b: terminals A, B, C: variables a, b, d: mixed strings X: terminal or variable notation e q0 S  •a For every item S  •a X A  •X A  X• For every item A  •X e A  •C C  •d For every pair of items A  •C, C  •d

  24. NFA example A  •aAb A  a•Ab A  aA•b A  aAb•  q0 a b A   A  •ab A  a•b A  ab•  A  aAb | ab NFA alphabet is S = {a, b, A} a start state is q0 b other states are items

  25. NFA to DFA conversion A  •aAb A  a•Ab A  aA•b A  aAb•  q0 a a b b A A   A  •ab A  a•b A  ab•  a A  a•Ab A  a•b A  •aAb A  •ab A  aA•b A  aAb• a A  •aAb A  •ab b b a, A qdie A  ab• b, A qdie

  26. Shift states and reduce states 3 2 1 5 4 b a A a A  a•Ab A  a•b A  •aAb A  •ab are shift states 1 2 3 A  aA•b A  aAb• A  •aAb A  •ab b are reduce states 4 5 A  ab•

  27. LR(0) parsing: second take 4 3 2 1 5 b a A a a b b action state stack  1 S A  aAb | ab a a 2 S A  a•Ab A  a•b A  •aAb A  •ab A  aA•b A  aAb• How do we know what to reduce to? aa 2 S A  •aAb A  •ab ? b aab 5 R A  ab• • • • •

  28. remember state in stack! 5 3 2 1 4 b a A a a b b action state stack  1 S A  aAb | ab a A 1a 2 S A  a•Ab A  a•b A  •aAb A  •ab A  aA•b A  aAb• backtrack two steps 1a2a 2 S A  •aAb A  •ab b 1a2a2b 5 R A  ab• 1a2A • • • •

  29. remember state in stack! 4 5 3 2 1 a b A a a b b A action state stack  1 S A  aAb | ab a A 1a 2 S A  a•Ab A  a•b A  •aAb A  •ab A  aA•b A  aAb• 1a2a 2 S A  •aAb A  •ab b 1a2a2b 5 R A  ab• 3 S 1a2A • • 4 R 1a2A3b

  30. PDA for LR(0) parsing 4 5 3 2 1 a b A a a b b A action state stack  1 S A  aAb | ab a A 1 2 S A  a•Ab A  a•b A  •aAb A  •ab A  aA•b A  aAb• 12 2 S A  •aAb A  •ab b 122 5 R A  ab• 3 S 12 • • • • • 4 R 123

  31. PDA for LR(0) parsing •A e, e/$ 4 5 3 1 2 a b A A, $/e a A• A  a•Ab A  a•b A  •aAb A  •ab A  aA•b A  aAb• pop stateb, statea pop stateb, stateA, statea A  •aAb A  •ab b take transition A out of statea take transition A out of statea push statea push statea A  ab•

  32. Example 1 L = {w#wR : w ∈ {a, b}*} A  aAa | bAb | # a a A e A  •aAa A  a•Aa A  aA•a A  aAa• e e e e # q0 A  •# A  #• e e e b b A e A  b•Ab A  bA•b A  bAb• A  •bAb

  33. Example 1 A  aAa | bAb | e stack state action e 1 S A  •aAa 5 6 1 3 7 4 8 2 1 4 S # A  •bAb A  #• 14 3 S A  •# 143 2 R b # 143 5 S a A # 1435 7 R A  a•Aa A  b•Ab A 14 6 S a A  •aAa A  •aAa 146 8 R a b b A  •bAb A  •bAb A  •# A  •# A A A  aA•a A  bA•b A a a 4 4 A  aAa• A  bAb• # b a a b • • • • • •

  34. LR(0) grammars and deterministic PDAs • The PDA for LR(0) parsing is deterministic • Some CFLs require non-deterministic PDAs, e.g. • What goes wrong when we do LR(0) parsing on L? L = {wwR : w ∈ {a, b}*}

  35. Example 2 L = {wwR : w ∈ {a, b}*} A  aAa | bAb | e a a A e A  •aAa A  a•Aa A  aA•a A  aAa• e e e e q0 A  • e e e b b A e A  b•Ab A  bA•b A  bAb• A  •bAb

  36. Example 2 L = {wwR : w ∈ {a, b}*} A  aAa | bAb | e A  •aAa A  •bAb A  • b a A  a•Aa A  b•Ab a A  •aAa A  •aAa b A  •bAb A  •bAb A  • A  • A A A  aA•a A  bA•b input: abba a a shift-reduce conflict 4 4 A  aAa• A  bAb•

  37. Parsing computer programs if (n == 0) { return x; } else { return x + 1; } Statement ifParExpressionStatement (Expression) Block Expression ExpressionRest { BlockStatements } BlockStatement Primary Infixop Expression Statement Identifier == Primary return Expression ; ID Literal Primary INT_LIT Identifier ID

  38. Parsing computer programs if (n == 0) { return x; } else { return x + 1; } Statement ifParExpressionStatement elseStatement Block Block (Expression) ... ... ... LR(0) parsers cannot tell apart if ... then from if ... then ... else ID

  39. When you can’t LR(0) parse • LR(0) parser can perform two actions: • What if: no complete itemis valid there is one valid item,and it is complete reduce (R) shift (S) some valid itemscomplete, some not more than one validcomplete item R / R conflict S / R conflict

  40. Hierarchy of context-free grammars context-free grammars LR(∞) grammars … to be continued… java perl python … LR(1) grammars LR(0) grammars parse using LR(0) algorithm

More Related