Compiler construction
This presentation is the property of its rightful owner.
Sponsored Links
1 / 44

Compiler Construction PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Compiler Construction. Syntax Analysis Top-down parsing. Syntax Analysis, continued. Syntax analysis. Last week we covered The goal of syntax analysis Context-free grammars Top-down parsing (a simple but weak parsing method) Today, we will Wrap up top-down parsing, including LL(1)

Download Presentation

Compiler Construction

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Compiler Construction

Syntax Analysis

Top-down parsing

Syntax Analysis, continued

Syntax analysis

  • Last week we covered

    • The goal of syntax analysis

    • Context-free grammars

    • Top-down parsing (a simple but weak parsing method)

  • Today, we will

    • Wrap up top-down parsing, including LL(1)

    • Start on bottom-up parsing

      • Shift-reduce parsers

      • LR parsers: SLR(1), LR(1), LALR(1)

Top-Down Parsing

Recursive descent (Last Week)

  • Recursive descent parsers simply try to build a parse tree, top-down, and BACKTRACK on failure.

  • Recursion and backtracking are inefficient.

  • It would be better if we always knew the correct action to take.

  • It would be better if we could avoid recursive procedure calls during parsing.

  • PREDICTIVE PARSERS can solve both problems.

Predictive parsers

  • A predictive parser always knows which production to use, so backtracking is not necessary.

  • Example: for the productionsstmt -> if ( expr ) stmt else stmt | while ( expr ) stmt | for ( stmt expr stmt ) stmt

  • a recursive descent parser would always know which production to use, depending on the input token.

Transition diagrams

  • Transition diagrams can describe recursive parsers, just like they can describe lexical analyzers, but the diagrams are slightly different.

  • Construction:

  • Eliminate left recursion from G

  • Left factor G

  • For each non-terminal A, do

    • Create an initial and final (return) state

    • For each production A -> X1 X2 … Xn, create a path from the initial to the final state with edges X1 X2 … Xn.

Using transition diagrams

  • Begin in the start state for the start symbol

  • When we are in state s with edge labeled by terminal a to state t, if the next input symbol is a, move to state t and advance the input pointer.

  • For an edge to state t labeled with non-terminal A, jump to the transition diagram for A, and when finished, return to state t

  • For an edge labeled ε, move immediately to t.

  • Example (4.15 in text): parse the string “id + id * id”

Example transition diagrams

  • An expression grammar with left recursion and ambiguity removed:

  • E -> T E’

  • E’ -> + T E’ | ε

  • T -> F T’

  • T’ -> * F T’ | ε

  • F -> ( E ) | id

Corresponding transition diagrams:

Predictive parsing without recursion

  • To get rid of the recursive procedure calls, we maintain our own stack.

The parsing table and parsing program

  • The table is a 2D array M[A,a] where A is a nonterminal symbol and a is a terminal or $.

  • At each step, the parser considers the top-of-stack symbol X and input symbol a:

    • If both are $, accept

    • If they are the same (nonterminals), pop X, advance input

    • If X is a nonterminal, consult M[X,a]. If M[X,a] is “ERROR” call an error recovery routine. Otherwise, if M[X,a] is a production of he grammar X -> UVW, replace X on the stack with WVU (U on top)


  • Use the table-driven predictive parser to parseid + id * id

  • Assuming parsing table

Initial stack is $E

Initial input is id + id * id $

Building a predictive parse table

  • We still don’t know how to create M, the parse table.

  • The construction requires two functions: FIRST and FOLLOW.

  • For a string of grammar symbols α, FIRST(α) is the set of terminals that begin all possible strings derived from α. If α =*> ε, then ε is also in FIRST(α).

  • FOLLOW(A) for nonterminal A is the set of terminals that can appear immediately to the right of A in some sentential form. If A can be the last symbol in a sentential form, then $ is also in FOLLOW(A).

How to compute FIRST(α)

  • If X is a terminal, FIRST(X) = X.

  • Otherwise (X is a nonterminal),

    • 1. If X -> ε is a production, add ε to FIRST(X)

    • 2. If X -> Y1… Yk is a production, then place a in FIRST(X) if for some i, a is in FIRST(Yi) and Y1…Yi-1 =*> ε.

  • Given FIRST(X) for all single symbols X,

  • Let FIRST(X1…Xn) = FIRST(X1)

  • If ε ∈ FIRST(X1), then add FIRST(X2), and so on…

How to compute FOLLOW(A)

  • Place $ in FOLLOW(S) (for S the start symbol)

  • If A -> α B β, then FIRST(β)-ε is placed in FOLLOW(B)

  • If there is a production A -> α B or a production A -> α B β where β =*> ε, then everything in FOLLOW(A) is in FOLLOW(B).

  • Repeatedly apply these rules until no FOLLOW set changes.

Example FIRST and FOLLOW

  • For our favorite grammar:E -> TE’E’ -> +TE | εT -> FT’T’ -> *FT’ | εF -> (E) | id

  • What is FIRST() and FOLLOW() for all nonterminals?

Parse table construction withFIRST/FOLLOW

  • Basic idea: if A -> α and a is in FIRST(α), then we expand A to α any time the current input is a and the top of stack is A.

  • Algorithm:

  • For each production A -> α in G, do:

  • For each terminal a in FIRST(α) add A -> α to M[A,a]

  • If ε ∈ FIRST(α), for each terminal b in FOLLOW(A), do:

  • add A -> α to M[A,b]

  • If ε ∈ FIRST(α) and $ is in FOLLOW(A), add A -> α to M[A,$]

  • Make each undefined entry in M[ ] an ERROR

Example predictive parse table construction

  • For our favorite grammar:E -> TE’E’ -> +TE | εT -> FT’T’ -> *FT’ | εF -> (E) | id

  • What the predictive parsing table?

LL(1) grammars

  • The predictive parser algorithm can be applied to ANY grammar.

  • But sometimes, M[ ] might have multiply defined entries.

  • Example: for if-else statements and left factoring:stmt -> if ( expr ) stmt optelseoptelse -> else stmt | ε

  • When we have “optelse” on the stack and “else” in the input, we have a choice of how to expand optelse (“else” is in FOLLOW(optelse) so either rule is possible)

LL(1) grammars

  • If the predictive parsing construction for G leads to a parse table M[ ] WITHOUT multiply defined entries,we say “G is LL(1)”

1 symbol of lookahead

Leftmost derivation

Left-to-right scan of the input

LL(1) grammars

  • Necessary and sufficient conditions for G to be LL(1):

  • If A -> α | β

    • There does not exist a terminal a such thata ∈ FIRST(α) and a ∈ FIRST(β)

    • At most one of α and β derive ε

    • If β =*> ε, then FIRST(α) does not intersect with FOLLOW(β).

This is the same as saying the

predictive parser always

knows what to do!

Top-down parsing summary

  • RECURSIVE DESCENT parsers are easy to build, but inefficient, and might require backtracking.

  • TRANSITION DIAGRAMS help us build recursive descent parsers.

  • For LL(1) grammars, it is possible to build PREDICTIVE PARSERS with no recursion automatically.

    • Compute FIRST() and FOLLOW() for all nonterminals

    • Fill in the predictive parsing table

    • Use the table-driven predictive parsing algorithm

Bottom-Up Parsing

Bottom-up parsing

  • Now, instead of starting with the start symbol and working our way down, we will start at the bottom of the parse tree and work our way up.

  • The style of parsing is called SHIFT-REDUCE

  • SHIFT refers to pushing input symbols onto a stack.

  • REDUCE refers to “reduction steps” during a parse:

    • We take a substring matching the RHS of a rule

    • Then replace it with the symbol on the LHS of the rule

  • If you can reduce until you have just the start symbol, you have succeeded in parsing the input string.

Reduction example

  • S -> aABe

  • Grammar: A -> Abc | b Input: abbcbcde

  • B -> d

  • Reduction steps: abbcbcde

  • aAbcbcde

  • aAbcde

  • aAde

  • aABe

  • S <-- SUCCESS!

In reverse, the

reduction traces

out a rightmost



  • The HANDLE is the part of a sentential form that gets reduced in a backwards rightmost derivation.

  • Sometimes part of a sentential form will match a RHS in G, but if that string is NOT reduced in the backwards rightmost derivation, it is NOT a handle.

  • Shift-reduce parsing, then, is really all about finding the handle at each step then reducing the handle.

  • If we can always find the handle, we never have to backtrack.

  • Finding the handle is called HANDLE PRUNING.

Shift-reduce parsing with a stack

  • A stack helps us find the handle for each reduction step.

  • The stack holds grammar symbols.

  • An input buffer holds the input string.

  • $ marks the bottom of the stack and the end of input.

  • Algorithm:

  • Shift 0 or more input symbols onto the stack, until a handle β is on top of the stack.

  • Reduce β to the LHS of the appropriate production.

  • Repeat until we see $S on stack and $ in input.

Shift-reduce example

  • E -> E + E

  • Grammar: E -> E * E w = id + id * id

  • E -> ( E )

  • E -> id


  • 1. $ id+id*id$ shift

Shift-reduce parsing actions

  • SHIFT: The next input symbol is pushed onto the stack.

  • REDUCE: When the parser knows the right end of a handle is on the stack, the handle is replaced with the corresponding LHS.

  • ACCEPT: Announce success (input is $, stack is $S)

  • ERROR: The input contained a syntax error; call an error recovery routine.

Conflicts during shift/reduce parsing

  • Like predictive parsers, sometimes a shift-reduce parser won’t know what to do.

  • A SHIFT/REDUCE conflict occurs when the parser can’t decide whether to shift the input symbol or reduce the current top of stack.

  • A REDUCE/REDUCE conflict occurs when the parser doesn’t know which of two or more rules to use for reduction.

  • A grammar whose shift-reduce parser contains errors is said to be “Not LR”

Example shift/reduce conflict

  • Ambiguous grammars are NEVER LR.

    • stmt -> if ( expr ) stmt

    • | if ( expr ) stmt else stmt

    • | other

  • If we have a shift-reduce parser in configuration


  • … if ( expr ) stmt else … $

  • what to do?

    • We could reduce “if ( expr ) stmt” to “stmt” (assuming the else is part of a different surrounding if-else statement)

    • We could also shift the “else” (assuming this else goes with the current if)

Example reduce/reduce conflict

  • Some languages use () for function calls AND array refs.

    • stmt -> id ( parameter_list )

    • stmt -> expr := expr

    • parameter_list -> parameter_list , parameter

    • parameter_list -> parameter

    • parameter -> id

    • expr -> id ( expr_list )

    • expr -> id

    • expr_list -> expr_list , expr

    • expr_list -> expr

Example reduce/reduce conflict

  • For input A(I,J) we would get token stream id(id,id)

  • The first three tokens would certainly be shifted:


  • … id ( id , id ) …

  • The id on top of the stack needs to be reduced, but we have two choices: parameter -> id OR expr -> id

  • The stack gives no clues. To know which rule to use, we need to look up the first ID in the symbol table to see if it is a procedure name or an array name.

  • One solution is to have the lexer return “procid” for procedure names. Then the shift-reduce parser can look into the stack to decide which reduction to use.

LR (Bottom-Up) Parsers

Relationship between parser types

LR parsing

  • A major type of shift-reduce parsing is called LR(k).

  • “L” means left-to-right scanning of the input

  • “R” means rightmost derivation

  • “k” means lookahead of k characters (if omitted, assume k=1)

  • LR parsers have very nice properties:

    • They can recognize almost all programming language constructs for which we can write a CFG

    • They are the most powerful type of shift-reduce parser, but they never backtrack, and are very efficient

    • They can parse a proper superset of the languages parsable by predictive parsers

    • They tell you as soon as possible when there’s a syntax error.

  • DISADVANTAGE: hard to build by hand (we need something like yacc)

LR parsing

LR parsing

  • The parser’s structure is similar to predictive parsing.

  • The STACK now stores pairs (Xi, si).

    • Xi is a grammar symbol.

    • si is a STATE.

  • The parse table now has two parts: ACTION and GOTO.

  • The action table specifies whether to SHIFT, REDUCE, ACCEPT, or flag an ERROR given the state on the stack and the current input.

  • The goto table specifies what state to go to after a reduction is performed.

Parser configurations

  • A CONFIGURATION of the LR parser is a pair (STACK, INPUT): ( s0 X1 s1… Xm sm, ai ai+1… an $ )

  • The stack configuration is just a list of the states and grammar symbols currently on the stack.

  • The input configuration is the list of unprocessed input symbols.

  • Together, the configuration represents a right-sentential form X1… Xm ai ai+1… an (some intermediate step in a right derivation of the input from the start symbol)

The LR parsing algorithm

  • At each step, the parser is in some configuration.

  • The next move depends on reading ai from the input and sm from the top of the stack.

    • If action[sm,ai] = shift s, we execute a SHIFT move, entering the configuration ( s0 X1 s1… Xm sm ai s, ai+1… an $ ).

    • If action[sm,ai] = reduce A -> β, then we enter the configuration ( s0 X1 s1… Xm-r sm-r A s, ai+1… an $ ), where r = | β | and s = goto[sm-r,A].

    • If action[sm,ai] = accept, we’re done.

    • If action[sm,ai] = error, we call an error recovery routine.

LR parsing example

  • Grammar:

  • 1. E -> E + T

  • 2. E -> T

  • 3. T -> T * F

  • 4. T -> F

  • 5. F -> ( E )

  • 6. F -> id

LR parsing example



  • 0 id * id + id $ shift 5

LR grammars

  • If it is possible to construct an LR parse table for G, we say “G is an LR grammar”.

  • LR parsers DO NOT need to parse the entire stack to decide what to do (other shift-reduce parsers might).

  • Instead, the STATE symbol summarizes all the information needed to make the decision of what to do next.

  • The GOTO function corresponds to a DFA that knows how to find the HANDLE by reading the top of the stack downwards.

  • In the example, we only looked at 1 input symbol at a time. This means the grammar is LR(1).

How to construct an LR parse table?

  • We will look at 3 methods:

    • Simple LR (SLR): simple but not very powerful

    • Canonical LR: very powerful but too many states

    • LALR: almost as powerful with many fewer states

  • yacc uses the LALR algorithm.

  • Login