Chapter 5 Bottom-Up Parsing

Chapter 5 Bottom-Up Parsing Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves which can be considered as bottom and working up towards the root, know as top. We can think of this process as one of reducing which reduce a string to the start symbol. At each reduction step, a particular substring matches the right side of production and is replaced by the symbol on the left of the production. An easy-to-implement form of shift-reduce parsing is operator-precedence parsing. A much more general method of shift-reduce parsing is LR(0) and SLR(1) parsing. The position of bottom-up syntax analyzer in compiler is shown by Fig.5.1. zhangjing@hrbeu.edu.cn

zhangjing@hrbeu.edu.cn

5.1 Operator-precedence Parsing • If a grammar has the property that has two adjacent nonterminals, we can easily construct efficient shift-reduce parsers by hand, the easy-to-implement parsing technique called operator-precedence parsing. The technique is described as a manipulation on tokens without any reference to any grammar. Once we finish building an operator-precedence parser from a grammar, we may efficiently ignore the grammar, using the nonterminals on the stack only as placeholders for attributes associated with the nonterminals. zhangjing@hrbeu.edu.cn

5.1.1 Relation between pairs of operator precedence • There are three relations between pairs of operator precedence, “a” and “b” belongs to VT , U, V and R belong to VN , • then their operator precedence are zhangjing@hrbeu.edu.cn

· · ＜＜ . . ＞＞ · · · 1. a b , means there are rules U∷=…ab… or U∷=…aVb… 2. a b, means there are rules U∷=…aR…，R+b…or R+Vb 3. a b, means there are rules U∷=…Rb…，R+…a or R+…aV • Note: The precedence relations between a and b are different with arithmetic relations “less than”, “equal to ” and “greater than”, that is, • a b does not equal to b a ，a b does not equal to b a zhangjing@hrbeu.edu.cn

· ＜ . ＞ · Example 5.1 • grammar G〔E〕： E∷=E+T|T T∷=T*F|F F∷=(E)|i • From rule F∷=（E），we can obtain the precedence relation between “(“ and “)” （） • From rule E∷=E+T，we know after “+” there is TT*…，so the precedence relation between “+” and “*”： • + * • From rule F∷=（E），and E+…+T，we can obtain the precedence relation between “+” and “ ) ” ： + ） zhangjing@hrbeu.edu.cn

5.1.2 Constructing Operator- precedence Relation • This section, we will give a general method of constructing operator precedence, firstly, we will define two new sets: FIRSTTERM(U) and LASTTERM(U). • b∈FIRSTTERM(U) when there is rule: U∷=+b… or U∷=+Vb… • b∈LASTTERM(U) when there is rule: U∷=+…b or U∷=+…bV • while b∈VT , V∈VN。 zhangjing@hrbeu.edu.cn

· ＜ . ＞ · • The algorithm of constructing operator precedence is • Step1 constructing set of FIRSTTERM and set of LASTTERM for each nonterminal. a, b∈VT and U, R∈VN。 • Step2 If there is grammar G like U∷=…ab… or U∷=…aVb… a b • If there is grammar G like U∷=…aR…, and, b∈FIRSTTERM(R) a b If there is grammar G like U∷=…Rb…, and, a∈LASTTERM(R) a b zhangjing@hrbeu.edu.cn

· ＜ . ＞ · • Step 3 • constructing operator precedence from string “ #” and other terminals, there are • # FIRSTTERM(U) • LASTTERM(U # • # # • According to the algorithm, we construct the operator precedence of example 5.1 zhangjing@hrbeu.edu.cn

So, the operator precedence matrix of example 5.1 is shown by table 5.1 zhangjing@hrbeu.edu.cn

5.1.3 Operator-precedence Grammar • Operator-precedence parsing has three disadvantages, • It is hard to handle tokens like the minus sign, which has two different precedence. • One can not always be sure the parser accepts exactly the desired language. • Only a small class of grammars can be parsed using operator-precedence techniques. • In an operator grammar, no production rule can have. zhangjing@hrbeu.edu.cn

 at the right side • two adjacent non-terminals at the right side. • E∷=AB E∷=EOE E∷=E+E | • A∷=a E∷=id E*E | • B∷=b O∷=+|*|/ E/E | i not operator grammar not operator grammar operator grammar zhangjing@hrbeu.edu.cn

· ＜ . ＞ • Operator grammar also can be called OG. There are three types disjoint precedence relation between pair of terminals, the three types disjoint precedence are , and , But if a pair of terminals only has one certain type precedence relation, this kind of OG is operator precedence grammar, namely, OPG. • For example, grammar E∷=E+E|E*E|E/E|i is not operator-precedence grammar. Because from Fig5.2, we know there are two grammar tree for(+，/), in addition there are two precedence relations between them, namely, • + / and + / zhangjing@hrbeu.edu.cn

Fig. 5.2 Two syntax tree of string E+E/E 图5.2句型E+E/E的两棵语法树 zhangjing@hrbeu.edu.cn

E T + E F T E + T i T * F Fig.5.3 syntax tree of #T+T*F+i# 图5.3 句型语法树 5.1.4 Leftmost Phrase • The syntax tree for sentence #T+T*F+i# in grammar G[E] of example 5.1 is shown by Fig.5.3. zhangjing@hrbeu.edu.cn

We can see that there are several phrases from Fig.5.3 • T （For nonterminal E） • T * F （For nonterminal T） • T + T * F （For nonterminal E） • i （For nontermina F） • T + T * F + i （For nonterminal E） zhangjing@hrbeu.edu.cn

The simple phrases are T，T * F and i, the handle is T, T*F is the leftmost phrase. So the definition of leftmost phrase is: it is a phrase that includes at least one terminal, in addition, it does not include any other phrase. • For example, there is sentence #F*i+i#, its syntax tree is shown by Fig.5.4. It has two phrases i and i, but F*i is not phrase, because it includes the other phrase i. zhangjing@hrbeu.edu.cn

· ＜ . ＞ · · • Next, we will give a general method to obtain the leftmost phrase of operator precedence. the sentence of a operator grammar #V1a1V2a2…Viai…Vn and Vn+1# • While Vi is non terminal, ai is terminal，that means there is only one non terminal between two adjacent terminals. Left most phrase has the property ai ai+1，ai+1 ai+2，aj-1 aj， aj aj+1 zhangjing@hrbeu.edu.cn

· ＜ . ＞ • the leftmost phrase is Vi+1ai+1…VjajVj+1 • For example, the sentence of G[E] is #T+T*F+i#, there are three nonterminals （V1=T，V2=T，V3=F）, and four terminals （a1=+，a2=*，a3=+，a4=i）, while a1，a2，a3 have the propertie, a1 a2， a2 a3 • So, T*F（namely, V2a2V3）is the leftmost phrase of the sentence #T+T*F+i#. zhangjing@hrbeu.edu.cn

5.1.5 The Algorithm and Program of Operator Precedence Parsing • This section, we will introduce a bottom-up parsing algorithm—operator precedence parsing algorithm. In the algorithm, every placeholder is leftmost phrase, namely, every reduction is to find the leftmost phrase. • Step 1. Construct operator precedence relation matrix. • Step 2. Create a symbol stack to store the reduction string or leftmost string, build other input stack to store input string. At beginning, there is only one symbol “#” in symbol stack, and there is the first terminal in input stack. zhangjing@hrbeu.edu.cn

. ＞ • Step 3. From the top terminal xn move to bottom of symbol stack, and at the same time compare with its closest. . • terminal, if xn-1 xn go on comparing xn-2 and xn-1 till xi-1 xi , now we can obtain the leftmost phrase: Nixi Ni+1xi+1… Nnxn Nn+1（If Ni is empty，xi is the beginning symbol） zhangjing@hrbeu.edu.cn

Step 4. In grammar G, we choose the right of rule is Nixi Ni+1xi+1… Nnxn Nn+1 to reduce (non terminal need not be same), that is, pop leftmost phrase at the top of symbol stack, and push its left of the rule into the stack. When there are only # or one non terminal and # in symbol stack, there is # in input stack, that means the analysis succeed, the input string is the sentence of the grammar, exit from the program; or not, return to 3. zhangjing@hrbeu.edu.cn

set p to point to the first symbol of w$ ; repeat forever if ( $ is on top of the stack and p points to $ ) then return else { let a be the topmost terminal symbol on the stack and let b be the symbol pointed to by p; if ( a b or a b ) then { /* SHIFT */ push b onto the stack; advance p to the next input symbol;} else if ( a b ) then/* REDUCE */ repeat pop stack until ( the top of stack terminal is related by to the terminal most recently popped ); else error();} • The program of operator precedence parsing is as follows. zhangjing@hrbeu.edu.cn

So for example 5.1 grammar G〔E〕： E∷=E+T|T T∷=T*F|F F∷=(E)|i • String i*（i+i）is recognized by operator precedence algorithm, the analysis process is shown by Table 5.2 zhangjing@hrbeu.edu.cn

Example 5.2 • Consider the following grammar • S ∷= ( L ) | a • L ∷= L , S | S • and the following operator-precedence relations zhangjing@hrbeu.edu.cn

Using these precedence relations to parse the sentence (a, (a, a)). zhangjing@hrbeu.edu.cn

5.2 LR（0）Parser • We have known that there are some limitations in grammar when we reduce by method of operator precedence, for example, the rule of U∷=εshould not be appeared, and there are two adjacent nonterminals in operator precedence grammar. For LR(0) parser, there are no such limits, so it is efficient bottom-up syntax analysis technique that can be used to parse a large class of context-free grammars. “L” in LR parsing means left-to-right scanning of the input, the “R” in it is for constructing a rightmost derivation in reverse, the “0” means need not to check up look a head for the input symbols that are used in making parsing decisions. zhangjing@hrbeu.edu.cn

5.2.1 Viable Prefix • In order to explain how to derivation from bottom to up, we will firstly discuss the concept of canonical prefix by an example. • There is grammar G〔S〕： S∷=aABe A∷=Abc|b B∷=d • We label four rules in G[S] by numbers, they are S∷= aABe〔1〕 A∷= Abc 〔2〕 A∷=b 〔3〕 B∷=d 〔4〕 zhangjing@hrbeu.edu.cn

So the right sentential deduction of “abbcde” is S  aABe  aAde  aAbcde  abbcde 〔1〕〔4〕〔2〕〔3〕 • The reduction of the input string “abbcde” is shown below. • So, the prefix of every derivation, we call it viable prefix. zhangjing@hrbeu.edu.cn

5.2.2 Constructing FA by Viable Prefix • It is remarkable fact that if it is possible to recognize a viable prefix knowing only the grammar symbol on the stack, there is finite automation that can determine what the handle is. . • In addition, we can define that item of grammar is the state of finite automation. zhangjing@hrbeu.edu.cn

Item of grammar is a production of grammar with a dot at some position of the right side. For example, production A∷=XYZ yields the four items A∷=·XYZ A∷=X·YZ A∷=XY·Z A∷=XYZ· zhangjing@hrbeu.edu.cn

The first item above indicates that we hope to see a string derivable from XYZ next on the input. The second item indicates that we have just seen on the input a string derivable from X , and we hope next step to see a string derivable from YZ. The production U∷=εgenerates only one item, U∷=·. • After defining the item, we know the states in finite automation, then we can design finite automation. For example, there is a rule of grammar: X::=aAc, it has three items, (h) X::= •aAc (i) X::=a•Ac (k) A::= •d zhangjing@hrbeu.edu.cn

h, i, k are items(states) of finite automation. The dot in state i is in next position of state h, so we can draw an arc from state h to state i, the arc is labeled by a. In addition, A is nonterminal, and there is item k that its left side is A, we can draw an arc from i to k and label the arc byε. . zhangjing@hrbeu.edu.cn

Example 5.3 • Grammar G[S]： S∷=E 〔1〕 E∷=aA 〔2〕 E∷= bB 〔3〕 A∷=cA 〔4〕 A∷= d 〔5〕 B∷=cB 〔6〕 B∷= d 〔7〕 zhangjing@hrbeu.edu.cn

From the item defined above, we know there are 18 items, and its finite automation is shown by Fig. 5.5 1． S∷=·E 2． S∷=E· 3． E∷=·aA 4． E∷=a·A 5． E∷=aA· 6． A∷=·cA 7． A∷=c·A 8． A∷=cA· 9． A∷=·d 10． A∷=d· 11． E∷=·bB 12． E∷=b·B 13． E∷=bB· 14． B∷=·cB 15． B∷=c·B 16． B∷=cB· 17． B∷=·d 18． B∷=d· zhangjing@hrbeu.edu.cn

We divide items into several types according to the dot position in item and judge by the symbol after the dot if it is nonterminal or terminal. • (1) Shift item, the item form looks like A::=α·aβ, means push “a” into stack, and state changes from before dot state to dot after state, while α,β∈V*, a∈VT. . • (2) Waiting reduction item, the item form looks like A::=α·Bβ, item after dot is waiting reduce item, it means after reduce B that A can be reduce, whileα,β∈V*,B∈VN. . zhangjing@hrbeu.edu.cn

(3) Reduce item, the item form is A::=α·, while α∈V*，namely, it is reduction item when dot is on the • rightmost, it means that the right side of a production has been analyzed, the handle has been recognized. • (4) Accept item, the item form looks like S∷=α·,whileα∈V+, S is start symbol. • In example 5.3, state 3 and state 17 is shift item, state 4 and state 15 is waiting reduce item, state 2 and state 5 is reduce item, in addition, state 2 is accept item. The connection arcs on path from start state to one of reduce state is viable prefix of the sentence, such as bccB is viable prefix. . zhangjing@hrbeu.edu.cn

From Fig.5.4, we know it is a nonfinite automation. The central idea in the LR method is to construct a deterministic finite automation from the grammar. So, we should group items together into sets, which can construct deterministic finite automation from it. We use closure operation to construct item sets. zhangjing@hrbeu.edu.cn

5.2.3 The Closure of set of items • If I is a set of items for a grammar G, then closure(I) is the set of items constructed from I by the two rules: 1 Initially, every item in I is added to closure(I). 2 If U∷=x·Vy is in closure(I) and V∷=z is a production, then add the item V∷=·z to I, if it is not already there. We apply this rule until no more new items can be added to closure(I). • For example, there is item S∷=·E, and it is in closure I0, then E∷=aA|bB, so the items E∷=·aA and E∷=·bB are in closure I0 too, that is, • I0={S∷=·E， E∷=·aA， E∷=·bB} zhangjing@hrbeu.edu.cn

Intuitively, U∷=x·Vy in closure (I) indicates that, at some points in the parsing, we might see a substring derivable from Vy that is as input. If V∷=z is a production, we also expect we might see a substring derivable from z. For this reason, V∷=·z is included in closure (I). • An useful application of closure is function GOTO (I, X), while I is a set of items and X is a symbol. GOTO (I, X) is defined to be the closure of the set of all items U∷= xX·y is in I. zhangjing@hrbeu.edu.cn

The algorithm of closure(I) is: C is { closure({S’.S}) } repeat the followings until no more set of LR(0) items can be added to C. for each I in C and each grammar symbol X if goto(I,X) is not empty and not in C add goto(I,X) to C zhangjing@hrbeu.edu.cn

With closure and GOTO function, we can easily change the NFA to DFA, Fig.5.6 is an example of it. zhangjing@hrbeu.edu.cn

5.2.4 LR(0) Parsing Table • LR(0) parser consists of an input and output stack, a driver program, and a parsing table that has two parts(ACTION and GOTO). The driving program is same for all LR parser, only the parsing table changes from each other. Input stack stores input string of the form s0X1s1X2s2…Xm sm, where each Xi is a grammar symbol , and each si is a symbol called a state. Parsing table includes two parts, a parsing action function ACTION and a goto function GOTO. ACTION and GOTO functions can recognize viable prefix from all the deterministic finite automation. . zhangjing@hrbeu.edu.cn

There are three rows in LR(0) parsing table, the first one represents the states (Ii); the second one is ACTION, means what ACTION should do next; the third one is GOTO, means to judge which state will be chosen next. We shall explain GOTO and ACTION as follow. x,y∈V ,a∈VT. Construct C={I0，I1，…In}, the collection of sets of LR(0) items for grammar. • (1) If U∷=x·ay is in Ii ,and GOTO（Ii，a）=Ij, then set ACTION [i，a] =“Sj”, Here “a” must be a terminal. • (2) If U∷=x· is in Ii, then set ACTION[i，a]= “rj ” or ACTION[i，#]= “rj ”, means using rule j: U∷=x to reduce, because “#” and “a” represents any symbol;. zhangjing@hrbeu.edu.cn

(3) If Z∷=x· is in Ii, Z is start symbol of grammar. then set ACTION[i，#]=“acc”，“acc”means accept. • (4) The GOTO transitions for state i are constructed for all nonterminals U, if GOTO(Ii，U)=Ij, then GOTO[i, U]= “ j ”. • (5) All entries is not defined by above rules are made “error”. • Note:if any conflicting action is generated by the above rules, we say the grammar is not LR(0), the algorithm fails to produce a parser in this case. zhangjing@hrbeu.edu.cn

Chapter 5 Bottom-Up Parsing

Chapter 5 Bottom-Up Parsing

Presentation Transcript

Bottom-up parsing

Chapter 5: Bottom-Up Parsing (Shift-Reduce)

Bottom-up Parsing

BOTTOM-UP PARSING

Bottom up Parsing

Chapter 4 Bottom Up Parsing

Bottom-up Parsing

Bottom-Up Parsing

Bottom-Up Parsing

Bottom up parsing

Bottom-Up Parsing

BOTTOM UP PARSING

Bottom Up Parsing

Bottom-up parsing

Bottom-up Parsing

Bottom-Up Parsing

Parsing Bottom Up

Bottom-Up Parsing

Bottom-up parsing

Chapter 5: Bottom-Up Parsing (Shift-Reduce)

Bottom Up Parsing

Chapter 4 Bottom Up Parsing