Compiler Structures

Compiler Structures 241-437, Semester 1, 2011-2012 • Objective • describe bottom-up (LR) parsing using shift-reduce and parse tables • explain how LR parse tables are generated 6. Bottom-up (LR) Parsing

Overview 1. What is a LR Parser? 2. Bottom-up using Shift-Reduce 3. Building a LR Parser 4. Generating the Parse Table 5. LR Conflicts 6. LL, SLR, LR, LALR Grammars

Source Program In this lecture Lexical Analyzer Front End Syntax Analyzer Semantic Analyzer but concentrating on bottom-up parsing Int. Code Generator Intermediate Code Code Optimizer Back End Target Code Generator Target Lang. Prog.

1. What is a LR Parser? • A LR parser reads its input tokens from Left-to-right and produces a Rightmost derivation. • The parse tree is built bottom-up, starting from the leaves and working upwards to the start symbol.

LR in Action parse "a b b c d e" Grammar:S  aA B eA  Ab c | bB  d The tree correspondsto a rightmost derivation:S a A B e aA d e a A b c d e a b b c d e Reducing a sentence:a b b c d ea A b c d ea A d ea A BeS These matchproduction’sright-hand sides S A A A A A A B A B a b b c d e a b b c d e a b b c d e a b b c d e

LR(k) Parsing • The k is to the number of input tokens that are looked at when deciding which production to use. • e.g. LR(0), LR(1) • We'll be using a variation of LR(0) parsing in this chapter.

LR versus LL • LR can deal with more complex (powerful) grammars than LL (top-down parsers). • LR can detect errors quicker than LL. • LR parsers can be implemented very efficiently, but they're difficult to build by hand (unlike LL parsers).

2. Bottom-up using Shift-Reduce • The usual way of implementing bottom-up parsing is by using shift-reduce: • ‘shift’ means read in a new input token, and push it onto a stack • ‘reduce’ means to group several symbols into a single non-terminal • by choosing a production to use 'backwards' • the symbols are popped off the stack, and the production's non-terminal is pushed onto it

Shift-Reduce Parsing S => a A B e A => A b c | b B => d Stack Input Action $ a b b c d e $ Shift $ a b b c d e $ Shift $ a b b c d e $ Reduce A => b $ a A b c d e $ Shift $ a A b c d e $ Shift $ a A b c d e $ Reduce A => A b c $ a A d e $ Shift $ a A d e $ Reduce B => d $ a A B e $ Shift $ a A B e $ Reduce S => a A B e $ $

3. Building a LR Parser • The standard way of writing a shift-reduce LR parser is to generate a parse table for the grammar, and 'plug' that into a standard LR compiler framework. • The table has two main parts: actions and gotos.

actions gotos a1 a2 … ai … an $ 3.1. Inside an LR Parser input tokens push; pop LR Parser output (parse tree) Xm sm Xm-1 sm-1 stack … Xo s0 Parse table (you create this bit) X is terminals or non-terminals, S = state possible actions are shift, reduce, accept, error gotos involve state changes

State a b c d e $ S A B 0 s1 1 s3 2 2 s5 s6 4 3 r3 r3 4 s7 5 s8 6 r4 7 acc 8 r2 r2 Parse Table for the Example 1: S => a A B e 2: A => A b c 3: A => b 4: B => d Action part Goto part s means shift to to that state r means reduce by that numbered production

3.2. Table Algorithm push(<$,0>); /* push <symbol,state> pair */ currToken = scanner(); while(1) { <x,state> = pair on top of stack; if (action[state, currToken ] == <shift newState>) { push(<currToken ,newState>); currToken = scanner();} : : 4 branches for the four possible actions that can be in a table cell continued

else if (action[state, currToken ] == <reduce ruleNum> ) { A --> b is rule number ruleNum; bodySize = numElements(b); pop bodySize pairs off stack; state’ = state part of pair on top of stack; push( <A, goto[state’,A] > ); } : : continued

else if (action[state,currToken ] = accept) { S --> b is the start symbol production; bodySize = numElements(b); pop bodySize pairs off stack; state’ = state part of pair on top of stack; if (state’ == 0) break; // success; can now stop else error(); } else error(); } // of while loop

S => a A B e A => A b c | b B => d 3.3. Table Parsing Example Stack Input Action $0 a b b c d e $ Shift 1 $0,a1 b b c d e $ Shift 3 pop 1 pair state' == 1 push(A,goto(1, A)) = push(A,2) $0,a1,b3 b c d e $ Reduce A => b $0,a1,A2 b c d e $ Shift 5 $0,a1,A2,b5 c d e $ Shift 8 $0,a1,A2,b5,c8 d e $ Reduce A => A b c pop 3 pairs state' == 1 push(A,goto(1, A)) = push(A,2) $0,a1,A2 d e $ Shift 6 $0,a1,A2,d6 e $ Reduce B => d $0,a1,A2,B4 e $ Shift 7 $0,a1,A2,B6,e7 $ Accept S => a A B e $0 $

3.4. The LR Parse Stack • The parse stack holds the branches of the tree being built bottom-up. • For example, • the stack $0,a1,A2,b5,c8 represents: A b c a b continued

A A The next stack: $0,a1,A2 b c a b Later, $0,a1,A2,B6,e7 A A B b c d e a b continued

4. Generating the Parse Table • The example parse table was generated using the SLR (simple LR)algorithm • an extension of LR(0) which uses the grammar's FOLLOW() sets • The other LR algorithms can be used to make a parse table: • e.g. LR(1), LALR(1)

Supporting Techniques • SLR table generation makes use of three techniques: • LR(0) items • the closure() function • the goto() function • I'll explain each one first, before the table generation algorithm.

4.1. LR(0) Items • An LR(0) item is a grammar production with a • at some position of the right-hand side. • So, a productionA X Y Zhas four items:A • X Y ZA X • Y Z A X Y • ZA X Y Z • • Production A  has one item A •

4.2. The closure() Function • The closure() function generates a set ofLR(0) items. • Assume that the grammar only has one production for the start symbol S, S =>b • The initial closure set is: closure( { S =>•b} ) continued

If A•B is in the set, then for each production B, add the item B• to the set, if it's not already there. • Repeat until no new items can be added to the set.

Grammar:S --> E E  E+T | TT  T*F | FF  (E)F  id Example use of closure() closure({ S•E }) = { S  • E } { S  • E E  • E+T E  • T T  • T*FT  • FF  • (E)F  • id } { S  • EE  • E+T E  • T } { S  • E E  • E+T E  • TT  • T*FT  • F } Add E• Add T• Add F•

4.3. The goto() Function X In In+1 • goto(In, X) takes as input an existing closure set In, and a terminal/non-terminal symbol X. • The output is a new closure set In+1: • for each item A   • X  in In, add closure({ A   X •  }) to In+1 • repeat until no more items can be added to In+1

goto() Example 1 • Grammar: S => A B // rule 1, for start symbol A => a B => b • Initial state I0 = closure( { S =>• A B } ) = { S =>• A B A =>• a } continued

goto( I0, A) = = closure( { S => A • B } ) = { S => A • B, B =>• b} // call it I1 • goto( I0, a) = = closure( { A => a • } ) = { A => a • } // call it I2 A I0 I1 a I2 continued

A B I0 I1 I3 end state a b I2 I4 • goto( I1, B) = = closure( { S => A B • } ) = { S => A B • } // call it I3 • this is the end of the S production • goto( I1, b) = = closure( { B => b • } ) = { B => b • } // call it I4

goto() Example 2 • Grammar: S => a A B e // rule 1, for start symbol A => A b c | b B => d • Initial state I0 = closure( { S =>• a A B e } ) = { S =>• a A B e } continued

a I0 I1 • goto( I0, a) = = closure( { S => a • A B e } ) = { S => a• A B e A =>• A b c A =>• b} // call it I1 continued

goto( I1, A) = = closure( { S => a A • B e A => A • b c } ) = { S => a A • B e A => A • b c B =>• d } // call it I2 • goto( I1, b) = = closure( { A => b • } ) = { A => b • } // call it I3 a I0 I1 A b I2 I3 continued

a I0 I1 A b • goto( I2, B) = = closure( { S => a A B • e } ) = { S => a A B • e } // call it I4 • Others • I5: { A => A b • c } • I6: { B => d • } • I7: { S => a A B e • } // end of start symbol rule • I8: { A => A b c • } I2 I3 B d b I5 I4 I6 e c I8 I7

4.4. Using goto() to make a Table • The columns of the table should be the grammar's terminals, $, and non-terminals. • The rows should be the I0, I1, …, Innumbers 0, 1, …, n. • what we've been calling states

Stage 1 • In stage 1, we add the shift, goto, and accept entries to the table. • action[i, a] gets <shift j> if goto(Ii,a) = Ij • goto[ i, A ] gets j if goto( Ii, A) == Ij continued

action[i, $] get accept if S => b• in Ii (there must be only one S rule)

a b $ S A B 0 1 2 3 4 Example Grammar 1 A B S --> A BA --> aB --> b I0 I1 I3 a b I2 I4 s2 1 s4 3 acc goto[] action[]

Stage 2 • In stage 2, we add the reduce and error entries to the table. • action[i, a] gets <reduce ruleNum> if [A => a• ] in Ii and A is not S and a is in FOLLOW(A) and A => a is rule number ruleNum continued

After filling the table cells with shift, goto, accept, and reduce actions, any remaining empty cells will trigger an error() call.

Finishing the Example Table • The reduce states are the state boxes at the leaves of the closure graph. • but exclude the end state • For the example 1 grammar, there are two boxes at the leaves: I2 and I4. A B I0 I1 I3 a b I2 I4

I2 Reduction S --> A BA --> aB --> b • I2 = { A => a • } • A => a is rule number 2 • FOLLOW(A) == FIRST(B) = { b } • So action[ 2, b ] gets <reduce 2>

I4 Reduction S --> A BA --> aB --> b • I4 = { B => b • } • B => b is rule number 3 • FOLLOW(B) = { $ } • So action[ 4, $ ] gets <reduce 3>

a b $ S A B 0 1 2 3 4 Adding Reduce Entries A B S --> A BA --> aB --> b I0 I1 I3 a b I2 I4 s2 1 s4 3 r2 acc r3 goto[] action[]

Using the Example 1 Table S --> A BA --> aB --> b Stack Input Action $0 a b $ Shift 2 $0,a2 b $ Reduce 2 (A --> a) $0,A1 b $ Shift 4 $0,A1,b4 $ Reduce 3 (B --> b) $0,A1,B3 $ Accept (S --> A B) $0 $ pop 1 pair; state' = 0; push(A, goto(0,A)) == push(A,1); pop 1 pair; state' = 1; push(B, goto(1,B)) == push(B,3);

a b c d e $ S A B 0 1 2 3 4 5 6 7 8 4.5. Example Grammar 2 Stage 1 a I0 I1 S --> a A B e A --> A b c | b B --> d s1 A b s3 2 I2 I3 s5 s6 4 B d b s7 I5 I4 I6 s8 e c I8 I7 acc action[] goto[]

Reduce States • For the example 2 grammar, there are three boxes at the leaves: I3, I6, and I8.

I3 Reduction S --> a A B e A --> A b c A --> b B --> d • I3 = { A => b • } • A => b is rule number 3 • FOLLOW(A) = {b}  FIRST(B) • = {b, d} • So action[ 3, b ] and action[ 3, d ] gets <reduce 3>

I6 Reduction S --> a A B e A --> A b c A --> b B --> d • I6 = { B => d • } • B => d is rule number 4 • FOLLOW(B) = {e} • So action[ 6, e ] gets <reduce 4>

I8 Reduction S --> a A B e A --> A b c A --> b B --> d • I8 = { A => A b c • } • A => A b c is rule number 2 • FOLLOW(A) = {b, d} • So action[ 8, b ] and action[ 8, d ] gets <reduce 2>

a b c d e $ S A B 0 1 2 3 4 5 6 7 8 S --> a A B e A --> A b c | b B --> d Adding Reduce Entries a I0 I1 s1 A b s3 2 I2 I3 s5 s6 4 B d r3 r3 b s7 I5 I4 I6 s8 e c r4 I8 I7 acc r2 r2 action[] goto[]

5. LR Conflicts • A LR conflict occurs when a cell in the action part of the parse table contains more than one action. • There are two kinds of conflict: • shift/reduce and reduce/reduce • Conflicts appear because of: • grammar ambiguity • limitations of the SLR parsing method (even when the grammar is unambiguous)

Compiler Structures