Chapter 4 - Part 3: Bottom-Up Parsing

Chapter 4 - Part 3: Bottom-Up Parsing Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre

Basic Intuition • Recall that • LL(k) works • TOP-DOWN • With a LEFTMOST Derivation • Predicts the right production to select based on lookahead • Our new motto • LR(k) works • BOTTOM-UP • With a RIGHTMOST Derivation • Commits to the production choice after seeing the whole body (left hand side), working in “reverse”

Bottom-Up Parsing • Inverse or Complement of Top-Down Parsing • Top Down Parsing Utilizes “Start Symbol” and Attempts to Derive the Input String using Productions • Bottom-Up Parsing Makes Modifications to the Input String which Allows it to Reduce to Start Symbol • For Example, Consider Grammar & Derivations:S  a A B e A  Abc | bB  d • What Does Each Derivation Represent? • Top-Down ---- Leftmost Derivation • Bottom-Up ---- Rightmost Derivation in Reverse! abbcde aAbcde aAde aABe S  S  aABe  aAbcBe  abbcBe  abbcde

Type of Derviation • Grammar:S  a A B e A  Abc | bB  d • Key Issues: • How do we Determine which Substring to “Reduce”? • How do we Know which Production Rule to Use? • What is the General Processing for BUP? • How are Conflicts Resolved? • What Types of BUP are Considered? TDP: S  aABe  aAbcBe  abbcBe  abbcde BUP: S aABe aAdeaAbcde abbcde Is a rightmost derivation that happens in reverse!

What is a Handle? • Defn: A Right-Sentential Form is Sentential Form that has Been Derived in a Righmost Derivation • S  aABe  aAde aAbcde  abbcde • Underline all Right Sentential Forms • Handle is a Substring of a Right Sentential Form that: • Appears on Right Hand Side of Production Rule • Can be Used to Reduce the Right Sentential Form via a Substitution in a Step of a RM Derivation • Formally is a rule A → β and position in Right Sentential Form γ s.t. S  RM*αAw RM αβw and A occurs at γ in αAw • Example: Handles are Underlined in: • S  aABe  aAde aAbcde  abbcde • Abc is Right hand Side of Rule A → Abc at Position 2 in Right Sentential Form γ = aAbcde

Consider again... S  aABe aAde aAbcde abbcde What is a Handle? S → aABe A → Abc | b B → d

What bottom-up really means... Handle Pruning abbcde aAbcde

Handle Pruning aAbcde aAde

Handle Pruning aAde aABe

Handle Pruning aABe S

What’s Going on in Parse Tree? • Consider Right Sentential Form: αβw and Rule A  β S A α What Does α Signify? w β What Does w Contain? What Does β Represent? Input Processed Still on Parsing Stack Input yet to be Consumed Candidate Handle to be Reduced

Bottom-Up Parsing … • Recognized body of last production applied in rightmost derivation • Replace the symbol sequence of that body by the RHS of the Production Rule Based on “Current” Input • Repeats • At the end • Either • We are left with the start symbol  Success! • Or • We get “stuck” somewhere  Syntax error! • Key Issue: If there are Multiple Handles for the “Same” Sentential Form, then the Grammar G is Ambiguous

General Processing of BUP • Basic mechanisms • “Shift” • “Reduce” • Basic data-structure • A stack of grammar symbols (Terminals and Non-Terminals) • Basic idea • Shift input symbols on the stack until ... the entire handle of the last rightmost reduction • When the body of the last RM reduction is on Stack, reduce it by replacing the body by the right-hand-side of the Production Rule • When only start symbol is left • We are done.

Example Rule to Reduce with Handle

Example Handle Rule to Reduce with

Example

Key Observation • At any point in time • Content of the stack is a prefix of a right-sentencial form • This prefix is called a viable prefix • Check again! • Below = all the right-sentencial form of a rightmost derivation • S aABeaAdeaAbcde abbcde

What is General Processing for BUP? • Utilize a Stack Implementation: • Contains Symbols, Non-Terminals, and Input • Input is Examined w.r.t. Stack/Current State • General Operation: Options to Process Stack Include: • Shift Symbols from Input onto Stack • When Handle β on Top of Stack • Reduce by using Rule: A  β • Pop all Symbols of Handle β • Push Non-Terminal A onto Stack • When Configuration ($S, $) of Stack, ACCEPT • Error Occurs when Handle Can’t be Found or S is on Stack with Non-Empty Input

Consider the Example Below

What are Possible Grammar Conflicts? • Shift-Reduce (S/R) Conflict: • Content of Stack and Reading Current Input • More than One Option of What to do Next stmt  if expr then stmt | if expr then stmt else stmt | otherConsider Stack as below with input of token else $ …. if expr then stmt • Do we Reduce if expr then stmt to stmt • Do we Shift “else” onto Stack?

What are Possible Grammar Conflicts? • Reduce-Reduce (R/R) Conflict: stmt  id ( parameter_list ) parameter_list  parameter_list, parameter parameter  id expr  id ( expression_list ) | id expression_list  expression_list, expr | expr Consider Stack as below with input of token $ …. id (id, … , id) …. • Do we Reduce to stmt? • Do we Reduce to expr?

Bottom-Up Parsing Techniques • LR(k) Parsers • Left to Right Input Scanning (L) • Construct a Rightmost Derivation in Reverse (R) • Use k Lookahead Symbols for Decisions • Advantages • Well Suited to Almost All PLs • Most General Approach/Efficiently Implemented • Detects Syntax Errors Very Quickly • Disadvantages • Difficult to Build by Hand • Tools to Assist Parser Construction (Yacc, Bison)

Components of an LR Parser Table Generator Grammar Parsing Table Driver Routines Parsing Table Output Parse Tree Input Tokens Differs Based on Grammar/Lookaheads Common to all LR Parsers

Three Classes of LR Parsers • Simple LR (SLR) or LR(0) • Easiest but Limited in Grammar Applicability • Grammar Leads to S/R and R/R Conflicts • Canonical LR • Powerful but Expensive • LR(k) – Usually LR(1) • Lookahead LR (LALR) – In Between Two • Two Fold Focus: • Parser Table Construction – Item and Item Sets • Examination of LR Parsing Algorithm

LR Parser Structure • action[sm , ai] is Parsing Table with Four Options1. Shift S onto Stack 2. Reduce by Rule3. Accept ($,$) 4. Report an Error • goto[sm , ai] determines next state for action • Question: What does following Represent? a1 ... ai ai+1 ...an$ INPUT (s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm , ai ai+1 ...an $) O U T P U T Grammar symbol (Terminal or non-terminal) LR Parsing Program state action goto X1 X2 ... Xm-1 Xm ai ai+1 ...an

What is the Parsing Table? • Combination of State, Action, and Goto • Shift s5 means shift input symbol and state 5 • Reduce r2 means reduce using rule 2 • goto state/NT indicates the next state

Actions Against Configuration • action[sm , ai] = • Shift s in Parsing Table – Move aism+1 to Stack(s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm ai sm+1 ,ai+1 ...an $) • Reduce A  β means • Remove 2×| β| symbols from stack and Push A along with state s = goto[sm-1 , A] onto stack • Uses Prior State after popping to determine goto • Accept – Parsing Complete • Error – Call recovery Routine Configuration: (s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm , ai ai+1 ...an $)

How Does BUP Work? Stack Input Action

Another Detailed Example

Constructing Parsing Tables • Three Types of Parsers (SLR, Canonical, LALR) all have Shared Concept for Parsing Table Construction • An Item Characterizes for Each Grammar Rule • What we’ve Seen or Derived • What we’ve Yet to See or Derive • Consider the Grammar Rule: E → E + T • There are Four Items for this RuleE → . E + TE → E . + T E → E + . T E → E + T . • E . + T Means we’ve Derived E and have yet to Derive + T, so we are Expecting “+” Next • Note: A → ε has Item A → . ____.____ Has To Be Been Seen/ Seen/ Derived Derived

Another Characterization of Items E → E + . T • Consider the Grammar Rule: E → E + T • There are Four Items for this RuleE → . E + TE → E . + T E → E + . T E → E + T . • This Represents Summary of History of Parse • Each Item Refers to: • What’s Been Placed on Stack (Left of “.”) • What Remains to Reduce for a Rule (Right of “.”) on stack left to derive/reduce Seen a string derived from E+ Looking for String Derivable from T Found input through the “+” Yet to process input for T

Start with SLR Parsing Table Construction • Step 1: Construct an Augmented Grammar which has a Single Alternative/Production Rule: • Now, Every Derivation Starts with the Production Rule: E’ → E $ Original E → E + T E → T T → T * F T → F F → ( E ) F → Id Augmented E’ → E $ E → E + T E → T T → T * F T → F F → ( E ) F → Id

Start with SLR Parsing Table Construction • Step 2: Construct the Closure of All Items • Intuitively, if A → α. B β is in Closure, we would Expect to see B β at Some Point in Derivation • If B → γis a Production Rule, Expect to see a Substring Derivable from γin Future • Step 3: Compute the GOTO (Item_Set, X), where X is a Grammar Symbol • Intuitively, Identifies Which Items are Valide for Viable Prefix γ • Utilized to Determine Next Action (State) for the Parser • Note: Different from goto as Previously Discussed!

Calculating Closure 1: E’→ E $ 2: E→ E + T 3: E → T 4: T → T * F 5: T → F 6: F → ( E ) 7: F → Id • Closure ([I]) where I is Set of Items • All Items in I are in Closure ([I]) • If A → α. B β in Closure ([I]) and B → γ is a Production Rule, then Add B → . γ to Closure ([I]) • Repeat Step 2 Until there are No New Items Added • I0 = Closure ([E’ → . E]) --- Add in Following ItemsE’ → . E - Rule 1 - Any Rules E → γ - Yes… E → . E + T - Rule 2 E → . T - Rule 3 - Any Rules T → γ - Yes… T → . T * F - Rule 4T → . F - Rule 5 - Any Rules T → γ - Yes…F → . ( E ) - Rule 6F → . id - Rule 7

What’s Next Step? • Recall the Parsing Table • States are 0, 1, 2, … 11 which Correspond to Item Sets • actions based on Input and Current State • goto is What State to Transition to Next • This is a Push Down Automata! • What are Three Critical Functions to Calculate? • State closure • To compute the set of productions in a given state • Transition function • To compute the states reachable from a given state • Items • To compute the set of states in the PDA

What is Important Part of Process? • Viable Prefix Definition • (1) a string that equals a prefix of a right-sentential form up to (and including) its unique handle. • (2) any prefix of a string that satisfies (1) • Essentially a subset of a right-sentential form • May be inclusive of entire handle (right hand side of a production rule) • Examples of Viable Prefixes are: • a, aA, aAd, aAbc, ab, aAb,… • Not viable prefixes: aAde, Abc, aAA,…

What is The Big Deal ? • Consider the stack again • Each Element of Stack Represents a right sentential form • They are all Viable Prefixes • When Parsing, two Alternatives: • lengthening a viable prefix • pruning a handle • In other words... • States represent viable prefixes • We transition between viable prefixes! Answer: We are either -

Intuition for this Process • Objective • Turn a Grammar into a PDA • We want • A PDA • With states the capture viable prefixes • We have • A grammar • With production rules • We know that • Production rules are used to derive handles • Viable prefixes are (strings) prefixes of handles

Example • Consider augmented grammar given below…. • Assume that • We start the parsing (with E’) and therefore • We are at the initial state of the PDA • We have some input: (e.g., id + id * id) • Questions • Which productions are activated at this point ? • In other words, which productions could be used to match the rest of the input ? 4: T → T * F 5: → F 6: F → ( E ) 7: → Id 1: E’ → E $ 2: E → E + T 3: → T

Example II • Consider the DerivationGiven Below… • In Example, Production Rules: 1,2,3,5,7 are active and utilized to “lead” to the viable prefix “id” 4: T → T * F 5: → F 6: F → ( E ) 7: → Id 1: E’ → E $ 2: E → E + T 3: → T E’  E $ by (1)  E + T $ by (2)  T + T $ by (3)  F + T $ by (5)  id + T $ by (7) ....

PDA State (Closure([E’ → E $]) • A PDA State is... • The set of productionsthat are active in the state • Question • How do we compute that from G ? 4: T → T * F 5: → F 6: F → ( E ) 7: → Id 1: E’ → E $ 2: E → E + T 3: → T State I0 E’ → . E $ E’ → . E $ E → . E + T E’ → . E $ E → . E + T E → . T E’ → . E $ E → . E + T E → . T T → . T * F T → . F E’ → . E $ E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id

E PDA Transition T F ( Id • How can we leave state I0 ? • What does it mean to leave I0 ? • Terminals – mean’s that we’ve Consumed the terminal from the input stream • Non-terminals – mean’s that we have pushed onto the stack the non-terminal, input, and states that will allow for a future reduction State I0 E’→ . E $ E → .E + T E → . T T → .T * F T → .F F → .( E ) F → .Id This defines the GOTO Function!

The GOTO Function • GOTO(I, X) is Defined for • An item set I • A grammar symbol (non-terminal or terminal) X • GOTO(I, X) = {items [A → αX .Β] where A → α. X β in I} • Algorithmically: • Look for Rules of Form: A → α. X β • Identify the Grammar Symbols in I to Right of “.” • Group all A → α. X β with Same “X” to Form a New State • Compute the Closure of the New State for All X • This leads to …

Destination states E’ → E .$ E → E . + T State I1 GOTO(I0, E) E → T . T → T . * F State I2 GOTO(I0, T) GOTO(I0, F) T → F . State I3 GOTO(I0, ( ) F → ( . E ) E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id State I4 GOTO(I0, id ) F → Id . State I5 State I0 E’ → . E $ E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id

For GOTO(I0, ( ) we compute Closure([F→ ( . E ) ]) Since E→ E + T and E→T, include E→ . E + T, E→. T Since T→ T * F and T→F, include T→ . T * F, T→. F Since F→ ( E ) and F→ Id, include F→ . ( E ), F→. Id Now, compute GOTO(I1, X ) for X = E, T, F, ( , Id Destination states GOTO(I0, ( ) F → ( . E ) E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id State I4 State I0 E’ → . E $ E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id

For the Three States above, the “.” Occurs at the end of an Item E→ T . and T→ F . and F→ id. Each if these is a “Reduction” to Replace T by E on Stack T by F on Stack F by id on Stack What Does it Mean when “.” at End of Rule? E → T . T → T . * F State I2 GOTO(I0, T) GOTO(I0, id ) GOTO(I0, F) F → Id . State I5 T → F . State I3 State I0 E’ → . E $ E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id

Represents the Possible Next Steps in a Derivation Consider Symbol Directly to Right of “.” That is what we Expect to see Next in a Derivation For two Rules, we Expect to See “E” Move “.” to Right to Consume “E” for Both Production Rules We’ve Seen “E” We expect to see What Follows “.” Next Now, Compute:Closure([E’→ . E $, E→ . E + T]) = State I1 How is this Interpreted … E’ → E .$ E → E . + T State I1 GOTO(I0, E) State I0 E’ → . E $ E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id E’→ . E $ E→ . E + T

Continue Process to Yield … • The State Machine also Represents Viable Prefixes • Possible Combinations that appear on Parsing Stack

Chapter 4 - Part 3: Bottom-Up Parsing