- 102 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Compiler Structures' - uttara

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Compiler Structures

241-437, Semester 1, 2011-2012

- Objective
- describe general syntax analysis, grammars, parse trees, FIRST and FOLLOW sets

4. Syntax Analysis

Overview

1. What is a Syntax Analyzer?

2. What is a Grammar?

3. Parse Trees

4. Types of CFG Parsing

5. Syntax Analysis Sets

In this lecture

Lexical Analyzer

Front

End

Syntax Analyzer

Semantic Analyzer

Int. Code Generator

Intermediate Code

Code Optimizer

Back

End

Target Code Generator

Target Lang. Prog.

if

(

a

==

0

)

a

=

b

;

Syntax Analyzer

1. What is a Syntax Analyzer?if (a == 0) a = b;

IF

builds a

parse tree

EQ

ASSIGN

a

0

a

b

(action)

(object)

(subject)

verb phrase

(indirect

object)

noun phrase

pronoun

verb

proper noun

article

noun

Syntax Analyses that we do- - Identify the function of each word
- - Recognize if a sentence is grammatically correct

grammar

types /

categories

the

card

I

gave

Jim

Languages

- We use a natural language to communicate
- its grammar rules are very complex
- the rules don’t cover important things
- We use a formal language to define a programming language
- its grammar rules are fairly simple
- the rules cover almost everything

2. What is a Grammar?

- A grammar is a notation for defining a language, and is made from 4 parts:
- the terminal symbols
- the syntactic categories (nonterminal symbols)
- e.g. statement, expression, noun, verb
- the grammar rules (productions)
- e,g, A => B1 B2 ... Bn
- the starting nonterminal
- the top-most syntactic category for this grammar

continued

We define a grammar G as a 4-tuple:

G = (T, N, P, S)

- T = terminal symbols
- N = nonterminal symbols
- P = productions/rules
- S = starting nonterminal

2.1. Example 1

- Consider the grammar:

T = {0, 1}

N = {S, R}

P = { S => 0 S => 0 R R => 1 S }

S is the starting nonterminal

the right hand sides

of productions usually

use a mix of terminals

and nonterminals

Is “01010” in the language?

- Start with a S rule:
- Rule String Generated-- SS => 0 R 0 RR => 1 S 0 1 SS => 0 R 0 1 0 RR => 1 S 0 1 0 1 SS => 0 0 1 0 1 0
- No more rules can be applied since there are no more nonterminals left in the string.

Yes, it

is in the

language.

Example 2

- Consider the grammar:

T = {a, b, c, d, z}

N = {S, R, U, V}

P = { S => R U z | z R => a | b R U => d V U | c V => b | c }

S is the starting nonterminal

Is “adbdbcz” in the language?

- Rule String Generated-- SS => R U z R U zR => a a U zU => d V U a d V U zV => b a d b U zU => d V U a d b d V U zV => b a d b d b U zU => c a d b d b c z

Yes!

This grammar has choices about how to rewrite the string.

Example 3: Sums

e.g. 5 + 6 - 2

- The grammar:

T = {+, -, 0, 1, 2, 3, ..., 9}

N = {L, D}

P = { L => L + D | L – D | DD => 0 | 1 | 2 | ... | 9 }

L is the starting nonterminal

Example 4: Brackets

- The grammar:

T = { '(', ')' }

N = {L}

P = { L => '(' L ')' LL => ε}

L is the starting nonterminal

ε means

'nothing'

2.2. Derivations

A sequence of the form:

w0 w1 … wn

is a derivationof wn from w0(or w0* wn)

Example:

L rule L => ( L ) L

( L ) L rule L => e

( ) L rule L => e

( )

L * ( )

This means that the sentence ( ) is a derivation of L

2.3. Kinds of Grammars

- There are 4 main kinds of grammar, of increasing expressive power:
- regular (type 3) grammars
- context-free (type 2) grammars
- context-sensitive (type 1) grammars
- unrestricted (type 0) grammars
- They vary in the kinds of productions they allow.

Regular Grammars

S => wTT => xTT => a

- Every production is of the form:

A => a | a B | e

- A, B are nonterminals, a is a terminal
- These are sometimes called right linear rules because if a nonterminal appears in the rule body, then it must appear last.
- Regular grammars are equivalent to REs.

Example

- Integer => + UInt | - UInt | 0 Digits | 1 Digits | ... | 9 DigitsUInt => 0 Digits | 1 Digits | ... | 9 DigitsDigits => 0 Digits | 1 Digits | ... | 9 Digits | e

Context-Free Grammars (CFGs)

A => aA => aBcdB => ae

- Every production is of the form:

A => d

- A is a nonterminal, d can be any number of nonterminals or terminals
- The Syntax Analyzer uses CFGs.

2.4. REs for Syntax Analysis?

- Why not use REs to describe the syntax of a programming language?
- they don’t have enough power
- Examples:
- nested blocks, if statements, balanced braces
- We need the ability to 'count', which can be implemented with CFGs but not REs.

3. Parse Trees

- A parse tree is a graphical way of showing how productions are used to generate a string.
- The syntax analyzer creates a parse tree to store information about the program being compiled.

Example

- The grammar:

T = { a, b }

N = { S }

P = { S => S S | a S b | a b | b a }

S is the starting nonterminal

Parse Tree for “aabbba”

expand the

symbol in

the circle

S

The root of the tree is the start symbol S:

Expand using S => S S

S

S

S

Expand using S => a S b

continued

Stop when there are no more nonterminals in leaf positions.

Read off the string by reading the leaves left to right.

S

S

S

a

b

a

S

b

a

b

3.1. Ambiguity

Two (or more) parse trees for the same string

E => E + EE => E – EE => 0 | … | 9

E

E

or

E + E

E - E

4

2

E + E

E - E

2 – 3 + 4

3

4

2

3

The two derivations:

EE + E E E – E

E – E + E 2 – E

2 – E + E 2 – E + E

2 – 3 + E 2 – 3 + E

2 – 3 + 4 2 – 3 + 4

Fixing Ambiguity

- An ambiguous grammar can sometimes be made unambiguous:

E =>E + T | E – T | T

T =>0 | … | 9

- We'll look at some techniques in chapter 5.

4. Types of CFG Parsing

- Top-down (chapter 5)
- recursive descent (predictive) parsing
- LL methods
- Bottom-up (chapter 6)
- operator precedence parsing
- LR methods
- SLR, canonical LR, LALR

4.1. A Statement Block Grammar

- The grammar:

T = {begin, end, simplestmt, ;}

N = {B, SS, S}

P = { B => begin SS endSS => S ; SS | εS => simplestmt | begin SS end }

B is the starting nonterminal

Parse Tree

begin simplestmt ; simplestmt ; end

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

SS

S

S

e

begin simplestmt ; simplestmt ; end

4.2. Top Down (LL) Parsing

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

begin simplestmt ; simplestmt ; end

continued

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

S

begin simplestmt ; simplestmt ; end

continued

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

S

begin simplestmt ; simplestmt ; end

continued

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

SS

S

S

begin simplestmt ; simplestmt ; end

continued

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

SS

S

S

begin simplestmt ; simplestmt ; end

continued

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

2

SS

4

SS

3

SS

6

S

5

S

e

begin simplestmt ; simplestmt ; end

4.3. Bottomup (LR) Parsing

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

S

begin simplestmt ; simplestmt ; end

continued

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

S

S

begin simplestmt ; simplestmt ; end

continued

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

S

S

e

begin simplestmt ; simplestmt ; end

continued

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

S

S

e

begin simplestmt ; simplestmt ; end

continued

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

SS

S

S

e

begin simplestmt ; simplestmt ; end

continued

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

5

SS

4

SS

1

SS

3

S

2

S

e

begin simplestmt ; simplestmt ; end

5. Syntax Analysis Sets

- Syntax analyzers for top-down (LL) and bottom-up (LR) parsing utilize two types of sets:
- FIRST sets
- FOLLOW sets
- These sets are generated from the programming language CFG.

5.1. The FIRST Sets

- FIRST( <non-terminal> ) =

set of all terminals that start productions for that non-terminal

- Example:

S => pingS => begin S end

FIRST(S) = { ping, begin }

More Mathematically

- A is a non-terminal.
- FIRST(A) =
- { c | A =>* c w , c is a terminal } { e } if A =>* e
- w is the rest of the terminals and nonterminals after 'c'

Building FIRST Sets

- For each non-terminal A,FIRST(A) =

FIRST_SEQ(a) FIRST_SEQ(b) ...

for all productions A => a, A => b, ...

- a, b are the bodies of the productions

FIRST_SEQ()

- FIRST_SEQ(e) = { e }
- FIRST_SEQ(c w) = { c }, if c is a terminal
- FIRST_SEQ(A w)

= FIRST(A), if eFIRST(A)

= (FIRST(A) – {e}) FIRST_SEQ(w), if eFIRST(A)

- w is a sequence of terminals and non-terminals, and possibly empty

S => a S e

S => B

B => b B e

B => C

C => c C e

C => d

FIRST(C) = {c,d}

FIRST(B) =

FIRST(S) =

FIRST() Example 1Start with FIRST(C) since its

rules only start with terminals

continued

FIRST(C) = {c,d}

FIRST(B) = {b,c,d}

FIRST(S) =

S => a S e

S => B

B => b B e

B => C

C => c C e

C => d

do FIRST(B) now that

we know FIRST(C)

continued

FIRST(C) = {c,d}

FIRST(B) = {b,c,d}

FIRST(S) = {a,b,c,d}

S => a S e

S => B

B => b B e

B => C

C => c C e

C => d

do FIRST(S) now that

we know FIRST(B)

P => i | c | n T S

Q => P | a S | b S c S T

R => b |e

S => c | R n | e

T => R S q

FIRST(P) = {i,c,n}

FIRST(Q) =

FIRST(R) = {b,e}

FIRST(S) =

FIRST(T) =

FIRST() Example 2Start with P and R since their

rules only start with terminals or e

continued

FIRST(P) = {i,c,n}

FIRST(Q) = {i,c,n,a,b}

FIRST(R) = {b,e}

FIRST(S) =

FIRST(T) =

P => i | c | n T S

Q => P | a S | b S c S T

R => b | e

S => c | R n | e

T => R S q

do FIRST(Q) now that we know FIRST(P)

continued

FIRST(P) = {i,c,n}

FIRST(Q) = {i,c,n,a,b}

FIRST(R) = {b,e}

FIRST(S) = {c,b,n,e}

FIRST(T) =

P => i | c | n T S

Q => P | a S | b S c S T

R => b | e

S => c | R n | e

T => R S q

do FIRST(S) now that we know FIRST(R)

Note:

S R n n because R * e

continued

FIRST(P) = {i,c,n}

FIRST(Q) = {i,c,n,a,b}

FIRST(R) = {b,e}

FIRST(S) = {c,b,n,e}

FIRST(T) = {b,c,n,q}

P => i | c | n T S

Q => P | a S | b S c S T

R => b | e

S => c | R n | e

T => R S q

do FIRST(T) now that we know FIRST(R) and FIRST(S)

Note:

T R S q S q q

because both R and S * e

S => a S e | S T S

T => R S e | Q

R => r S r | e

Q => S T | e

FIRST(S) = {a}

FIRST(T) = {r, a, e}

FIRST(R) = {r, e}

FIRST(Q) = {a, e}

FIRST() Example 3Order

1) R, S

2) Q

3) T

5.2. The FOLLOW Sets

- FOLLOW( <non-terminal> ) =
- set of all the terminals that follow <non-terminal> in productions
- the set includes $ if nothing follows <non-terminal>

More Mathematically

- A is a non-terminal.
- FOLLOW(A) =

{ c in terminals | S =>+ . . . A c . . . } { $ } if S =>+ . . . A

. . . is a sequence of terminals and non-terminals

=>+ is any number of => expansions

Building FOLLOW() Sets

- To make the FOLLOW(A) set, apply rules 1-4: 1. for all productions (B=> . . . A ) add FIRST_SEQ()-{} 2. for all (B=> . . . A )and FIRST_SEQ()add FOLLOW(B) 3. for all (B=> . . . A)add FOLLOW(B) 4. if A is the start symbol then add { $}
- b is a sequence of termminals and non-terminals

Small Examples

- What is in FOLLOW(A) for the productions:

B => A C

C => s

- FOLLOW(A) gets FIRST_SEQ(C) == FIRST(C) == { s }
- uses rule 1

continued

What is in FOLLOW(A) for the productions:

C => B r

B => t A

- FOLLOW(A) gets FOLLOW(B) == { r }
- uses rule 3

S => a S e | B

B => b B C f | C

C => c C g | d | e

FIRST(C) = {c,d,e}

FIRST(B) = {b,c,d,e}

FIRST(S) = {a,b,c,d,e}

FOLLOW(C) =

FOLLOW(B) =

FOLLOW(S) = {$, e}

FOLLOW() Example 1S is the start symbol

continued

S => a S e | B

B => b B C f | C

C => c C g | d | e

FIRST(C) = {c,d,e}

FIRST(B) = {b,c,d,e}

FIRST(S) = {a,b,c,d,e}

FOLLOW(C) = {f,g} follow(B)

FOLLOW(B)= FIRST_SEQ(C f) -{e} FOLLOW(S) = {c, d, f, $, e}

FOLLOW(S) = {$,e}

continued

S => a S e | B

B => b B C f | C

C => c C g | d | e

FIRST(C) = {c,d,e}

FIRST(B) = {b,c,d,e}

FIRST(S) = {a,b,c,d,e}

FOLLOW(C) = {f,g,c,d,$,e}

FOLLOW(B)= {c, d, f, $, e}

FOLLOW(S) = {$,e}

S => ( A ) | e

A => T E

E => & T E | e

T => ( A ) | a | b | c

FIRST(T) = {( ,a,b,c}

FIRST(E) = {& , e }

FIRST(A) = {( ,a,b,c}

FIRST(S) = {( , e}

FOLLOW(S) = {$}

FOLLOW(A) = {)}

FOLLOW(E) =

FOLLOW(T) =

FOLLOW() Example 2continued

S => ( A ) | e

A => T E

E => & T E | e

T => ( A ) | a | b | c

FIRST(T) = {(,a,b,c}

FIRST(E) = {&, e }

FIRST(A) = {(,a,b,c}

FIRST(S) = {(, e}

FOLLOW(S) = { $ }

FOLLOW(A) = { ) }

FOLLOW(E) =

FOLLOW(A) FOLLOW(E)= { ) }

FOLLOW(T) =

(FIRST_SEQ(E) – {e}) FOLLOW(A) FOLLOW(E) = {&, )}

S => T E1

E1 => + T E1 | e

T => F T1

T1 => * F T1 | e

F => ( S ) | id

FIRST(F) = FIRST(T) = FIRST(S) = {(,id}

FIRST(T1) = {*,e}

FIRST(E1) = {+,e}

FOLLOW(S) = {$,)}

FOLLOW(E1) =

FOLLOW(T) =

FOLLOW(T1) =

FOLLOW(F) =

FOLLOW() Example 3continued

S => T E1

E1 => + T E1 | e

T => F T1

T1 => * F T1 | e

F => ( S ) | id

FIRST(F) = FIRST(T) = FIRST(S) = {(,id}

FIRST(T1) = {*,e}

FIRST(E1) = {+,e}

FOLLOW(S) = {$,)}

FOLLOW(E1) = FOLLOW(S) Follow(E1) = {$,)}

FOLLOW(T) = FIRST(E1) FOLLOW(S) FOLLOW(E1) = {+,$,)}

FOLLOW(T1) = FOLLOW(T) = {+,$,)}

FOLLOW(F) = FIRST(T1) FOLLOW(T) FOLLOW(T1) = {*,+,$,)}

S => A B C | A D

A => a | a A

B => b | c | e

C => D a C

D => b b | c c

FIRST(D) = FIRST(C) = {b,c}

FIRST(B) = {b,c,e}

FIRST(A) = FIRST(S) = {a}

FOLLOW(S) = {$}

FOLLOW(D) = {a,$}

FOLLOW(A) =

FOLLOW(B) =

FOLLOW(C) =

FOLLOW() Example 4continued

S => A B C | A D

A => a | a A

B => b | c | e

C => D a C

D => b b | c c

FIRST(D) = FIRST(C) = {b,c}

FIRST(B) = {b,c,e}

FIRST(A) = FIRST(S) = {a}

FOLLOW(S) = {$}

FOLLOW(D) = {a,$}

FOLLOW(A) = {b,c}

FOLLOW(B) = {b,c}

FOLLOW(C) = {$}

Download Presentation

Connecting to Server..