Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3

Winter 2012-2013Compiler PrinciplesSyntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University

Today • Shift-reduce parsing • How does a shift-reduce parser work? • How do we construct a shift-reduce parse table? • LR(1), SLR(1), LALR(1) • Meaning of shift-reduce / reduce-reduce conflicts • Behavior on left-recursion/right-recursion • Automatic parser generation (CUP) • Handling ambiguity

Model of an LR parser Input Stack LRParsing program state Output symbol

LR parser stack Sequence made of state, symbol pairs For instance a possible stack for the grammarS  E $E  TE  E+ TT idT  (E)could be: 0 T 2 + 7 id 5

Form of LR parsing table non-terminals state terminals 0 rk gm 1 acc ... gotopart shift/reduceactions sn error shift state n reduce by rule k gotostate m accept

LR parser table example (1) S E $ (2) E  T (3) E  E+ T (4) T id (5) T  (E)

Shift move Input Stack LRParsing program Output If action[q, a] = sn

Result of shift Input Stack LRParsing program Output If action[q, a] = sn

Reduce move Input Stack LRParsing program Output 2*|β| • If action[qn, a] = rk • Production: (k) A β • If β= σ1… σnTop of stack looks like q1 σ1…qnσn • goto[q, A] = qm

Result of reduce move Input Stack LRParsing program Output 2*|β| • If action[qn, a] = rk • Production: (k) A β • If β= σ1… σnTop of stack looks like q1 σ1…qnσn • goto[q, A] = qm

Accept move Input Stack LRParsing program Output If action[q, a] = accept parsing completed

Error move Input Stack LRParsing program Output • If action[q, a] = error • parsing discovered a syntactic error

Parsing id+id$ (1) S E $ (2) E  T (3) E  E+ T (4) T id (5) T  (E) Initialize with state 0

Parsing id+id$ (1) S E $ (2) E  T (3) E  E+ T (4) T id (5) T  (E)

Parsing id+id$ (1) S E $ (2) E  T (3) E  E+ T (4) T id (5) T  (E) pop id 5

Parsing id+id$ (1) S E $ (2) E  T (3) E  E+ T (4) T id (5) T  (E) push T 6

Parsing id+id$ (1) S E $ (2) E  T (3) E  E+ T (4) T id (5) T  (E)

Constructing an LR parsing table Construct a (determinized) transition diagram from LR items If there are conflicts – stop Fill table entries from diagram

Form of LR(0) items To be matched Already matched Input N  αβ Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

Types of LR(0) items N  αβ Shift Item N  αβ Reduce Item

LR(0) items enumeration example LR(0) items Grammar (1) S E $ (2) E  T (3) E  E+ T (4) T id (5) T  (E) All items can be obtained by placing a dot at every position for every production:

Operations for transition diagram construction Initial = {S’S$} For an item set IClosure(I) = Closure(I) + {Xµ is in grammar| NαXβ in I} Goto(I, X) = { NαXβ | NαXβ in I}

Initial example Grammar (1) S E $ (2) E  T (3) E  E + T (4) T id (5) T  ( E ) Initial = {S E $}

Closure example Grammar (1) S E $ (2) E  T (3) E  E + T (4) T id (5) T  ( E ) Initial = {S E $} Closure({S E $}) =S E $E TE E + TT id T  ( E )

Goto example Grammar (1) S E $ (2) E  T (3) E  E + T (4) T id (5) T  ( E ) Initial = {S E $} Closure({S E $}) =S E $E TE E + TT id T  ( E ) Goto({S E $ , E E + T, T id}, E) = {S  E $, E  E + T}

Constructing the transition diagram • Start with state 0 containing itemClosure({S E $}) • Repeat until no new states are discovered • For every state p containing item set Ip, and symbol N, compute state q containing item setIq = Closure(goto(Ip, N))

Automaton construction example (1) S E $ (2) E  T (3) E  E + T (4) T id (5) T  ( E ) q6 E  T q7 T q0 T T  (E) E  T E  E + T T  i T  (E) S  E$ E  T E  E + T T  i T  (E) ( q5 i i T  i E E ( ( i q1 q8 q3 S  E$ E  E+ T T  (E) E  E+T E  E+T T  i T  (E) + + $ ) q9 q2 S  E$ T  (E)  T q4 E  E + T

Automaton construction example (1) S E $ (2) E  T (3) E  E + T (4) T id (5) T  ( E ) q0 S  E$ Initialize

Automaton construction example (1) S E $ (2) E  T (3) E  E + T (4) T id (5) T  ( E ) q0 S  E$ E  T E  E + T T  i T  (E) applyClosure

Automaton construction example (1) S E $ (2) E  T (3) E  E + T (4) T id (5) T  ( E ) q6 E  T T q0 T  (E) E  T E  E + T T  i T  (E) S  E$ E  T E  E + T T  i T  (E) ( q5 i T  i E q1 S  E$ E  E+ T

Automaton construction example (1) S E $ (2) E  T (3) E  E + T (4) T id (5) T  ( E ) q6 E  T q7 T q0 T T  (E) E  T E  E + T T  i T  (E) S  E$ E  T E  E + T T  i T  (E) non-terminal transition corresponds to goto action in parse table ( q5 i i T  i E E ( ( i q1 q8 q3 Z  E$ E  E+ T T  (E) E  E+T E  E+T T  i T  (E) terminal transition corresponds to shift action in parse table + + $ ) q9 q2 S  E$ T  (E)  T q4 E  E + T a single reduce item corresponds to reduce action

Conflicts Can construct a diagram for every grammar but some may introduce conflicts shift-reduce conflict: an item set contains at least one shift item and one reduce item reduce-reduce conflict: an item set contains two reduce items

LR(0) conflicts … T q0 S  E$ E  T E  E + T T  i T  (E) T  i[E] ( … q5 i T  i T  i[E] E Shift/reduce conflict … S  E $ E  T E  E + T T  i T  ( E ) T  i[E]

LR(0) conflicts … T q0 S  E$ E  T E  E + T T  i T  (E) T  i[E] ( … q5 i T  i V  i E reduce/reduce conflict … S  E $ E  T E  E + T T  i V  iT  ( E )

LR(0) conflicts • Any grammar with an -rule cannot be LR(0) • Inherent shift/reduce conflict • A – reduce item • PαAβ – shift item • A can always be predicted from P αAβ

LR variants • LR(0) – what we’ve seen so far • SLR(0) • Removes infeasible reduce actions via FOLLOW set reasoning • LR(1) • LR(0) with one lookahead token in items • LALR(0) • LR(1) with merging of states with same LR(0) component

SRL parsing • A handle should not be reduced to a non-terminal N if the lookahead is a token that cannot follow N • A reduce item N  α is applicable only when the lookahead is in FOLLOW(N) • If b is not in FOLLOW(N) we just proved there is no derivation S =>* βNαband thus it is safe to remove the reduce item from the conflicted state • Differs from LR(0) only on the ACTION table • Now a row in the parsing table may contain both shift actions and reduce actions and we need to consult the current token to decide which one to take

SLR action table Lookahead token from the input vs. SLR – use 1 token look-ahead LR(0) – no look-ahead … as before… T  i T  i[E]

Going beyond SLR(0) (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L Some common language constructs introduce conflicts even for SLR

q3 R S → R  q0 S’ → S S → L = R S → R L → * R L →  id R → L S q1 S’ → S q9 S → L = R  q2 L S → L = R R → L  = R q6 S → L = R R →  L L → * R L → id q5 id * L → id  id * id q4 L → * R R → L L → * R L → id * L q8 R → L  L q7 L → * R  R

shift/reduce conflict (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L q2 S → L = R R → L  = q6 S → L = R R →  L L → * R L → id • S → L  = R vs. R → L  • FOLLOW(R) contains = • S ⇒ L = R ⇒ * R = R • SLR cannot resolve conflict

Inputs requiring shift/reduce (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L q2 S → L = R R → L  = q6 S → L = R R →  L L → * R L → id For the input id the rightmost derivationS’ => S => R => L => id requires reducing in q2 For the input id = idS’ => S => L = R => L = L => L = id => id = idrequires shifting

LR(1) grammars • In SLR: a reduce item N  α is applicable only when the lookahead is in FOLLOW(N) • But FOLLOW(N) merges lookahead for all alternatives for N • Insensitive to the context of a given production • LR(1) keeps lookahead with each LR item • Idea: a more refined notion of follows computed per item

LR(1) items LR(1) items [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L LR(0) items [L → ● id] [L → id ●] • LR(1) item is a pair • LR(0) item • Lookahead token • Meaning • We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token • Example • The production L  id yields the following LR(1) items

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3