More SLR /LR(1)

More SLR /LR(1) CMSC 431 Shon Vick

RememberComputing FIRST (N) • If N e First (N) includes e • if N aABC First (N) includes a • if N X1X2 First (N) includes First (X1) • if N X1X2… and X1 e, • First (N) includes First (X2) • Obvious generalization to First (a) where a is X1X2...

Computing Follow (N) • Follow (N) is computed from productions in which N appears on the rhs • For the sentence symbol S, Follow (S) includes $ • if A a N b, Follow (N) includes First (b) • because an expansion of N will be followed by an expansion from b • if A a N, Follow (N) includes Follow (A) • because N will be expanded in the context in which A is expanded • if A a N B , B e, Follow (N) includes Follow (A)

Recall our Example • A grammar to generate all palindromes over S = { a, b } 1) S--> P 2) P --> a Pa 3) P --> b P b 4) P --> c • LR parsers work with an augmented grammar in which the start symbol never appears in the right side of a production. Here the original grammar was rules 2-4

Computing the Items • S0: S--> .P , P --> .a P a, P--> .bP b, P-->.c • S1: S--> P. • S2: P --> a.Pa, P-->.aPa,P-->.bPb,P-->.c • S3:P--> b.P b, P-->.aPa,P-->.bPb,P-->.c • S4: P--> c. • S5: P--> aP.a • S6:P--> bP.b • S7: P--> aPa. • S8: P--> bP b.

Finite State Machine • Draw the FSA. The major difference is that transitions can be both terminal and non-terminal symbols. • The Goto and Action Parts of the parsing table come from the FSA

FSA c Io S-> .P P -> .aPa P -> .bPb P ->.c I1 S-> P. P a I2 P -> a.Pa P -> .aPa P -> .bPb P ->.c a I5 P-> a P.a b b P I4 P-> c. c a I3 P -> b.Pb P -> .aPa P -> .bPb P ->.c a I7 P-> a Pa. c b P 1) P -> aPa 2) P -> bPb 3) P -> c b I6 P-> bP.b I8 P-> bPb.

Parsing Table

Parsing Table Contd • Si means shift the input symbol and goto state I. • Rj means reduce by jth production. Note that we are not storing all the items in the state in our table. • example: abcba$ • if we go thru, parsing algorithm, we get

Example Contd • StateInputAction • $0 abcba$ shift • $0a2 bcba$ shift • $0a2b3 cba$ shift • $0a2b3c4 ba$ reduce • $0a2b3P6 ba$ shift • $0a2b3P6b8 a$ reduce • $0a2P5 a$ shift • $0a2P5a7 $ reduce • $0P 1 $accept

LR(0) Summary • LR(0) state: set of LR(0) items • LR(0) item: a production with a dot in RHS • Compute LR(0) states and build DFA • Use closure operation to compute states • Use goto operation to compute transitions • Build LR(0) parsing table from the DFA • Use LR(0) parsing table to determine whether to shift or reduce

LR(0) Limitations • An LR(0) machine only works if states with reduce actions have a single reduce action • With a more complex grammar, construction gives states with shift/reduce or reduce/reduce conflicts • Need to use lookahead to choose

A Non-LR(0) Grammar • Grammar for addition of numbers • S  S + E | E • E  num • Left-associative version is LR(0) • Right-associative is not LR(0) • S  E + S | E • E  num

Shift/Reduce Conflicts • An LR(0) state contains a conflict if its canonical set has two items that recommend conflicting actions. • shift/reduce conflict - when one item prompts a shift action, the other prompts a reduce action. • reduce/reduce conflict - when two items prompt for reduce actions by different production. • A grammar is said be to be LR(0) grammar, if the table does not have any conflicts.

Shift/Reduce Conflict S’ -> .S S -> .A b | d c | b A c A -> .d A very simple language = {db, dc, bdc} Follow(S) = {$}, Follow(A) = {b,c} Form part f the SLR(1) parser: I0 S’ -> .S S -> .A b S -> . d c S -> . b A c A -> .d I1 S -> d .c A -> d. But since c is in Follow(A), we don’t whether to shift or reduce in I1 D1: S’ -> S ->dc D2; S’ ->S ->bAc ->bdc

Reduce/Reduce Conflict S’-> S S -> b A e | b B d | A c A -> d B -> Ec E-> d S’ -> S -> Ac -> dc S’ ->S -> bBd -> bEcd -> bdcd S’ -> S -> bAe -> bde d I2 A -> d. E -> d . I0 S’ -> .S S -> . b A e S -> .b B d S -> .A c A -> .d I1 S -> b .A e S -> b .B d A -> . d B -> .E c E -> .d b Which reduction should be taken? There is not enough context to decide!

SLR(1) Grammar • An LR parser using SLR(1) parsing tables for a grammar G is called as the SLR(1) parser for G. • If a grammar G has an SLR(1) parsing table, it is called SLR(1) grammar (or SLR grammar in short). • Every SLR grammar is unambiguous, but every unambiguous grammar is not a SLR grammar.

SLR Summary • Uses DFA to recognize viable prefixes of grammar G • Each state in the DFA: • is the set of LR(0 items valid for a viable prefix • “encodes” information about the symbols that have been shifted onto the stack • Valid LR(0) items are computed by applying the closure and goto functions to the initial, valid item [S’ -> .S] (this is called the canonical collection of LR(0) items) • Uses FOLLOW to disambiguate actions

SLR(1) Summary • If A -> aAb is in Ikand goto(Ik, a) = Ij, then set actions[k,a] to sj • If A -> a is in Ikthen set actions[k,b]to rule#, for all be FOLLOW(A) • If S’ -> S. is inIkthen set actions[k,$] to accept Rules 1-3 may define conflicting actions for an entry in the actions table. In this case, the grammar is not SLR(1).

reduce/reduce shift/reduce OK L  L , S . S  S . , L L  S , L . L  S . L  L , S . LR(0) Limitations • An LR(0) machine only works if states with reduce actions have a single reduce action • With a more complex grammar, construction gives states with shift/reduce or reduce/reduce conflicts • Need to use lookahead to choose

A Non-LR(0) Grammar • Grammar for addition of numbers • S  S + E | E • E  num • Left-associative version is LR(0) • Right-associative is not LR(0) • S  E + S | E • E  num

LR(0) Parsing Table 3 1 2 Grammar S  E + S | E E  num S  E + . S S  . E + S S  . E E  . num + E S  E . +S S  E. S’  . S $ S  .E + S S  . E E  .num E 4 num E  num . S 5 num S 7 S  E + S. S’  S . $ S’  S $. $ Shift or reduce in state 2? • num + $ E S • s4 g2 g6 • SE s3/SE SE

Solve Conflict With Lookahead • 3 popular techniques for employing lookahead of 1 symbol with bottom-up parsing • SLR – Simple LR • LALR – LookAhead LR • LR(1) • Each as a different means of utilizing the lookahead • Results in different processing capabilities

num + $ E S • s4 g2 g6 • s3 SE SLR Parsing • SLR Parsing = Easy extension of LR(0) • For each reduction X  , look at next symbol C • Apply reduction only if C is not in FOLLOW(X) • SLR parsing table eliminates some conflicts • Same as LR(0) table except reduction rows • Adds reductions X   only in the columns of symbols in FOLLOW(X) Example: FOLLOW(S) = {$}

num + $ E S • s4 g2 g6 • s3 SE • s4 g2 g5 • Enum Enum • SE+S • s7 • accept SLR Parsing Table • Reductions do not fill entire rows as before • Otherwise, same as LR(0) Grammar S  E + S | E E  num

Class Problem Think of L as l-value, R as r-value, and * as a pointer dereference Consider: S  L = R S  R L  *R L  ident R  L When you create the states in the SLR(1) DFA, 2 of the states are the following: S  L . = R R  L . S  R . Do you have any shift/reduce conflicts?

S1: S’  S ● S c S3: S dc●a Ac ● a S5: S dca● S0: S'  S S  dca S ● dAb S2: S  d ca S  d Ab A● c d A S4: S  dA●b b S6: S dAb● AnotherSLR(1)Example UMBC • S’S • Sdca • SdAb • Ac In S3 there is reduce/shift conflict: It can be R4 or shift. By looking at the Follow set of A, the conflict is removed.

S1: S’ S ● S3: Sdc●a Ac● S a S5: Sdca● c S2: Sdca SdAb A●c S0: S'  S S dca S● dAb S● Aa A● c d A b S6: SdAb● S4: SdA●b S7: SA●a A c S8: SAa● a S9: A  c ● Non-SLR(1) example • S3 has shift/reduce conflict. • By looking at Follow(A), • both a and b are in the follow set. • So under column a we still don’t know whether to reduce or shift. • S’S • Sdca • SdAb • SAa • Ac

The conflict SLR parsing table Follow(A) = {a, b}

LR(1) • Solution: keep more information about context. Namely keep track what next input symbol can be as part of DFA state • Idea: keep an input look-ahead as part of each item - these are called LR(1) items • Always a subset of Follow(A) for any non-terminal A (may not be a proper subset) • Can give rise to larger parsers (i.e. many states) than SLR but recognizes a greater number of constructs

LR(k) Items • The table construction algorithm for an LR(k) parser uses LR(k) items to represent the set of possible states in a parse • An LR(k) item is a pair [a, b], where • ais a production from G with a “.”at some position in the rhs • bis a look-ahead string containing k (where k is typically 1) symbols that are terminals or $ • Example LR(1) item [A -> X . Y Z , a] b a

LR(k) Items • What’s the point of the look-ahead symbols? • Carry them along to allow us to choose correct reduction when there is any choice • Look-ahead symbols are bookkeeping unless item has unless reducing (i.e. has a “.” at the right end) [A -> X . Y Z , a] [A -> X Y Z . , a] Use to Guide Reduction No Use The point: for [A -> a ., a] and [B -> a ., b], we can decide between reducing to A or to B by looking at limited right context

LR(1) DFA Construction If S’ = goto(S,x) then add an edge labeled x from S to S’ S’  . S , $ S  . E + S , $ S  . E , $ E  .num , +,$ S’ S . , $ S E  num . , +,$ num num E Grammar S’  S$ S  E + S | E E  num + S  E + . S , $ S  . E + S , $ S  . E , $ E  . num , +,$ S  E . + S , $ S  E . , $ E S S  E+S. , +,$

LR(1) Reductions Reductions correspond to LR(1) items of the form (X   . , y) S’  . S , $ S  . E + S , $ S  . E , $ E  .num , +,$ S’ S . , $ S E  num . , +,$ num Grammar S’  S$ S  E + S | E E  num num E + S  E + . S , $ S  . E + S , $ S  . E , $ E  . num , +,$ S  E . + S , $ S  E . , $ E S S  E . , +,$

LR(1) Parsing Table Construction • Same as construction of LR(0), except for reductions • For a transition S  S’ on terminal x: • Table[S,x] += Shift(S’) • For a transition S  S’ on non-terminal N: • Table[S,N] += Goto(S’) • If I contains {(X   . , y)} then: • Table[I,y] += Reduce(X  )

LR(1) Parsing Table Example 1 S’  . S , $ S  . E + S , $ S  . E , $ E  .num , +,$ Grammar S’  S$ S  E + S | E E  num 3 S  E + . S , $ S  . E + S , $ S  . E , $ E  . num , +,$ E 2 + S  E . + S , $ S  E . , $ + $ E 1 g2 2 s3 SE Fragment of the parsing table

LALR(1) Grammars • Problem with LR(1): too many states • LALR(1) parsing (aka LookAhead LR) • Constructs LR(1) DFA and then merge any 2 LR(1) states whose items are identical except lookahead • Results in smaller parser tables • Theoretically less powerful than LR(1) • LALR(1) grammar = a grammar whose LALR(1) parsing table has no conflicts S  id . , $ S  E . , + S  id . , + S  E . , $ ?? + =

LALR Parsers • LALR(1) • Generally same number of states as SLR (much less than LR(1)) • But, with same lookahead capability of LR(1) (much better than SLR) • Pascal programming language • In SLR, several hundred states • In LR(1), several thousand states

LL/LR Grammar Summary • LL parsing tables • Non-terminals x terminals  productions • Computed using FIRST/FOLLOW • LR parsing tables • LR states x terminals  {shift/reduce} • LR states x non-terminals  goto • Computed using closure/goto operations on LR states • A grammar is: • LL(1) if its LL(1) parsing table has no conflicts • same for LR(0), SLR, LALR(1), LR(1)

Classification of Grammars LL(1) LR(k)  LR(k+1) LL(k)  LL(k+0) LL(k)  LR(k) LR(0)  SLR LALR(1)  LR(1) LR(1) LALR(1) SLR LR(0) Not to scale 

Automate the Parsing Process • Can automate: • The construction of LR parsing tables • The construction of shift-reduce parsers based on these parsing tables • LALR(1) parser generators • yacc, bison • Not much difference compared to LR(1) in practice • Smaller parsing tables than LR(1) • Augment LALR(1) grammar specification with declarations of precedence, associativity • Output: LALR(1) parser program

Associativity E  E + E E  num S  S + E | E E  num What happens if we run this grammar through LALR construction? + E  E + E E  num E  E + E . , + E  E . + E , +,$ 1 + 2 + 3 shift: 1+ (2+3) reduce: (1+2)+3 shift/reduce conflict

Associativity (2) • If an operator is left associative • Assign a slightly higher value to its precedence if it is on the parse stack than if it is in the input stream • Since stack precedence is higher, reduce will take priority (which is correct for left associative) • If operator is right associative • Assign a slightly higher value if it is in the input stream • Since input streamis higher, shift will take priority (which is correct for right associative)

Precedence E  E + E | T T  T x T | num | (E) E  E + E | E x E | num | (E) What happens if we run this grammar through LALR construction? Shift/reduce conflict results E  E + E . , x E  E . x E, ... E  E . + E , ... E  E x E . , + Precedence: attach precedence indicators to terminals Shift/reduce conflict resolved by: 1. If precedence of the input token is greater than the last terminal on parse stack, favor shift over reduce 2. If the precedence of the input token is less than or equal to the last terminal on the parse stack, favor reduce over shift

References • http://www.cs.nyu.edu/courses/spring02/G22.2130-001/parsing2.ppt • http://www.cs.rpi.edu/~moorthy/Courses/compiler98/Lectures/lecturesinppt/lecture7.ppt • http://www.cs.rutgers.edu/~tdnguyen/classes/cs415/lectures/lecture11.pdf • LR Parsing • Syntax Analysis • Modern Compiler Implementation in Java, Andrew Appel, Cambridge University Press

More SLR /LR(1)

More SLR /LR(1)

Presentation Transcript