270 likes | 283 Views
Fall 2009. The Chinese University of Hong Kong. CSC 3130: Automata theory and formal languages. LR( k ) grammars. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. LR(0) example from last time. 4. A aA•b. a. A. b. 2. 5. A a•Ab A a•b A •aAb A •ab. 1. A aAb•.
E N D
Fall 2009 The Chinese University of Hong Kong CSC 3130: Automata theory and formal languages LR(k) grammars Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
LR(0) example from last time 4 A aA•b a A b 2 5 A a•Ab A a•b A •aAb A •ab 1 A aAb• a A •aAb A •ab b 3 A ab• A aAb | ab
LR(0) parsing example revisited S Input A Stack a 1 1 1a2 1a2a2 1a2a2b3 1a2A4 1a2A4b5 1A aabb abb bb b b 1 2 2 3 4 5 2 A •aAb A •ab S S S R S R a A a•Ab A a•b A •aAb A •ab 3 b A ab• A • A 5 4 b a A • • • • A aAb• A aA•b b a b • • A aAb | ab A aAb aabb
Meaning of LR(0) items eNFA transitions to: X •g A undiscovered part shift focus to subtree rooted at X (if X is nonterminal) b a X • focus A aX•b A a•Xb move past subtreerooted at X
Outline of LR(0) parsing algorithm • Algorithm can perform two actions: • What if: no complete itemis valid there is one valid item,and it is complete reduce (R) shift (S) some valid itemscomplete, some not more than one validcomplete item R / R conflict S / R conflict
Definition of LR(0) grammar • A grammar is LR(0) if S/R, R/R conflicts never occur • LR means parsing happens left to right and produces a rightmost derivation • LR(0) grammars are unambiguous and have a fastparsing algorithm • Unfortunately, they are not “expressive” enoughto describe programming languages
Hierarchy of context-free grammars context-free grammars parse using CYK algorithm (slow) LR(∞) grammars … java perl python … LR(1) grammars LR(0) grammars parse using LR(0) algorithm
A grammar that is not LR(0) S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a
A grammar that is not LR(0) S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a possibilities: shift (3), reduce (4)reduce (5), shift (6) S valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a S S A A B A A A S/R, R/R conflicts! a a a a a a c • • •
Lookahead S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a peek inside! S valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a S S A A B A A A a a a a a a c • • •
Lookahead S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a a peek inside! S valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a A A … a a • action: shift parse tree must look like this
Lookahead S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a a a peek inside! S valid LR(0) items: A a•A, A a• A •aA, A •a A A A … a a • action: shift parse tree must look like this
Lookahead S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a a a S valid LR(0) items: A a•A, A a• A •aA, A •a A A A a a a • action: reduce parse tree must look like this
LR(0) items vs. LR(1) items A LR(1) A LR(0) A A b b a a • • b b a a A A A a•Ab [A a•Ab, b] a a b b A aAb | ab
LR(1) items • LR(1) items are of the formto represent this state in the parsing [A a•b, x] or [A a•b, e] A A x a b a b • •
Outline of LR(1) parsing algorithm • Step 1: Build NFA that describes valid item updates • Step 2: Convert NFA to DFA • As in LR(0), DFA will have shift and reduce states • Step 3: Run DFA on input, using stack to remember sequence of states • Use lookahead to eliminate wrong reduce items
Recall eNFA transitions for LR(0) • States of eNFA will be items (plus a start state q0) • For every item S •a we have a transition • For every item A •X we have a transition • For every item A a•Cb and production C •d e q0 S •a X A •X A X• e A •C C •d
eNFA transitions for LR(1) • For every item [S •a, e]we have a transition • For every item A •X we have a transition • For every item [A a•Cb, x] and production C dfor every y in FIRST(bx) e q0 [S •a, e] X [A •X, x] [A X•, x] e [A •C, x] [C •d, y]
FIRST sets • Example FIRST(a) is the set of terminals that occuron the left in some derivation starting from a FIRST(a) = {a} FIRST(A) = {a}FIRST(S) = {a, c} FIRST(bAc) = {b} FIRST(BA) = {a} FIRST(e) = ∅ S A(1) | cB(2)A aA(3) | a(4)B a(5) | ab(6)
Explaining the transitions A A x x b b a X a X • • X [A •X, x] [A X•, x] C b A y • d x b a C • e [A •C, x] [C •d, y] y ∈ FIRST(bx)
Example: Constructing the NFA [S A•, e] S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) A [A •aA, e] e [S •A, e] [A •a, e] e e . . . q0 [S B•c, e] e B e [S •Bc, e] [B •a,c] e [B •ab,c]
Example: Constructing the NFA S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) [S A•, e] A a e A [A aA•, e] [A •aA, e] [S •A, e] [A a•A, e] e e a [A •a, e] [A a•, e] e q0 e c [S B•c, e] [S Bc•, e] B e a [S •Bc, e] [B •a,c] [B a•,c] e a b [B •ab,c] [B a•b,c] [B ab•,c]
Example: Running the NFA S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) look ahead! input valid items A stack abc [S •A, ] [S •Bc, ] [A •aA, ] [A •a, ] [B •a, c] [B •ab, c] S a bc [A a•A, ] [A a•, ] [B a•, c] [B a•b, c] [A •aA, ] [A •a, ] S ab c [B ab•, c] R B c [S B•c, ] S Bc [S Bc•, ] R S
Convert NFA to DFA • Each DFA state is a subset of LR(1) items, e.g. • States can contain S/R, R/R conflicts • But lookahead can always resolve such conflicts [A a•A, ] [A a•, ] [B a•, c] [B a•b, c] [A •aA, ] [A •a, ]
Example: Convert NFA to DFA LEGEND S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) shift variable 8 1 2 7 4 5 6 3 shift terminal reduce A [A a•A, e] [S •A, e] [A •aA, e] [A a•A, e] [S •Bc, e] [A •a, e] [A •aA, e] A a a [A •aA, e] [A aA•, e] [B a•b,c] [A •a, e] [A •a, e] [A a•, e] [A a•, e] [B •a,c] [B a•,c] [B •ab,c] a b A B c [S B•c, e] [S Bc•, e] [B ab•,c] [S A•, e]
Example: Reconstruct the parse tree input A stack S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) 2 7 6 8 7 6 2 1 8 [S •A, e] a [S •Bc, e] [A •aA, e] abc 1 [A •a, e] B [B •a,c] c [S B•c, e] [B •ab,c] S 12 bc S [A a•A, e] S 128 c [S Bc•, e] [A •aA, e] B R 16 c [A •a, e] b [B a•b,c] S 167 a b c [B ab•,c] [A a•, e] R 1 [B a•,c]
LR(k) grammars • A context-free grammar is LR(1) if all S/R, R/Rconflicts can be resolved with one lookahead • More generally, LR(k) grammars can resolve allconflicts with k lookahead symbols • Items have the form [A •, c1...ck] • LR(1) grammars describe the semantics of mostprogramming languages