cse p501 compilers n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CSE P501 – Compilers PowerPoint Presentation
Download Presentation
CSE P501 – Compilers

Loading in 2 Seconds...

play fullscreen
1 / 50

CSE P501 – Compilers - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

CSE P501 – Compilers. Parsing Context Free Grammars (CFG) Ambiguous Grammars Next. Parsing. ‘Middle End’. Back End. Target. Source. Front End. chars. IR. IR. Scan. Select Instructions. Optimize. tokens. IR. Allocate Registers. Parse. IR. AST. Emit. Convert. IR. IR.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

CSE P501 – Compilers


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cse p501 compilers

CSE P501 – Compilers

Parsing

Context Free Grammars (CFG)

Ambiguous Grammars

Next

Jim Hogg - UW - CSE - P501

parsing
Parsing

‘Middle End’

Back End

Target

Source

Front End

chars

IR

IR

Scan

Select Instructions

Optimize

tokens

IR

Allocate Registers

Parse

IR

AST

Emit

Convert

IR

IR

Machine Code

AST = Abstract Syntax Tree

IR = Intermediate Representation

Jim Hogg - UW - CSE - P501

valid tokens valid program
Valid Tokens != Valid Program

MiniJava includes the following tokens (among many others):

  • class int [ ( . true < this ) + * ; while = if id ilit ! / new {

So a MiniJavaScanner would happily accept the following program:

  • int ; = true { while ( x < true * if { or 123 ) goto count_99

We rely on a MiniJavaParser to reject this kind of gibberish

But how do we specify what makes a validMiniJava program?

Jim Hogg - UW - CSE - P501

what is parsing
What is Parsing?
  • Analogous to parsing an English sentence
    • Analyze words into subject, verb, object, etc
  • How to parse a program?
    • Analyze tokens into language constructs: assignment, if-clause, function-call, while-loop, etc

Jim Hogg - UW - CSE - P501

parsing so what s the problem
Parsing – so what’s the problem?
  • The set of valid programs is infinite
  • The set of invalid programs is infinite
  • Q: How to specify all valid programs, succinctly?
  • A: Define a Grammar
    • More specifically, a Context-Free Grammar (CFG)

Jim Hogg - UW - CSE - P501

grammar for the hokum language

Context-Free Grammars (CFG)

Grammar for the Hokum Language
  • ProgStm;Prog|Stm
  • StmAsStm|IfStm
  • AsStmVar=Exp
  • IfStmifExpthenAsStm
  • VorCVar|Const
  • ExpVorC|VorC+VorC
  • Var[a-z]
  • Const[0-9]
  • Context-Free Grammar ~ CFG ~ Grammar ~ Backus-Naur Form ~ BNF
  • Productions, or Rules
  • Terminals & Non-Terminals; Start (Symbol)
  • Multiple languagespresent in the description

Jim Hogg - UW - CSE - P501

example hokum programs
Example Hokum Programs

Legal

Hokum BNF Grammar

a= 1; b = a + 4;

z = 1;

if b + 3 then z = 2

ProgStm;Prog|Stm

StmAsStm|IfStm

AsStm Var=Exp

IfStm ifExpthenAsStm

VorC Var|Const

Exp VorC|VorC+VorC

Var [a-z]

Const [0-9]

Illegal

a= x < 20

b= a + 4 + 5 ;

z = 1

if (a == 33) z < 2 ;

But how do we know which programs are legal or illegal, in Hokum?

Jim Hogg - UW - CSE - P501

derivation

ProgStm;Prog|Stm

StmAsStm|IfStm

AsStm Var=Exp

IfStm ifExpthenAsStm

VorC Var|Const

Exp VorC|VorC+VorC

Var [a-z]

Const [0-9]

Derivation

Prog

=> Stm; Prog

=> AsStm; Prog

=> Var= Exp ; Prog

=> a = Exp; Prog

=> a = VorC ; Prog

=> a = Const ; Prog

=> a = 1 ; Prog

=> a = 1 ; Stm

=> a = 1 ; IfStm

=> a = 1 ; if Expthen AsStm

=> a = 1 ; if VorC+ VorC then AsStm

=> a = 1 ; if Var + VorC then AsStm

=> a = 1 ; if a + VorCthen AsStm

=> a = 1 ; if a + Const then AsStm

=> a = 1 ; if a + 1 then AsStm

=> a = 1 ; if a + 1 then Var = Exp

=> a = 1 ; if a + 1 then b = Exp

=> a = 1 ; if a + 1 then b = VorC

=> a = 1 ; if a + 1 then b = Const

=> a = 1 ; if a + 1 then b = 2

  • => versus 
  • Leftmost, rightmost, middlemost
  • Sentential Form & Sentence
  • What is a Context-Sensitive Grammar?

Jim Hogg - UW - CSE - P501

parse tree

ProgStm;Prog|Stm

StmAsStm|IfStm

AsStm Var=Exp

IfStm ifExpthenAsStm

VorC Var|Const

Exp VorC|VorC+VorC

Var [a-z]

Const [0-9]

Parse Tree

Prog

Prog

;

Stm

Stm

AsStm

IfStm

Var

=

Exp

then

Exp

if

AsStm

a

VorC

Var

=

Exp

VorC

+

VorC

Const

VorC

b

Var

Const

Const

1

1

a

2

Jim Hogg - UW - CSE - P501

junk nodes in the parse tree
Junk Nodes in the Parse Tree

Prog

Prog

;

Stm

Stm

AsStm

IfStm

Var

=

Exp

then

Exp

if

AsStm

a

VorC

Var

=

Exp

VorC

+

VorC

Const

VorC

b

Var

Const

Const

1

1

a

2

Jim Hogg - UW - CSE - P501

ast abstract syntax tree
AST (Abstract Syntax Tree)

Prog

=

IfStm

Var:a

Const:1

+

=

Const:2

Var:b

Var:a

Const:1

Jim Hogg - UW - CSE - P501

why not just regex
Why Not Just RegEx?

Try to invent a regex description for arithmetic expressions: single-character variable names; operators + - and  [Note: red denotes terminals, below]

v = [a-z] // variable

o = + | - |  | // operator

v ( o v )* // derives a + b  cok, but now add ( )

(? v (o v )? )* // derives (a + b)  c

// but also gibberish like:a + b)  c (

Almost every programming language includes such balanced pairs: ( ), { }, begin end. Conclusion: regex won’t work.

More generally, regex correspond to DFAs. They can only ‘count’ pairs up to a finite limit.

Jim Hogg - UW - CSE - P501

context sensitive grammar
Context-Sensitive Grammar?
  • All compiler work uses Context-Free Grammars, or CFGs
  • Why so-called? Alternatively:
    • What is a non-context-free grammar? (ie, a Context-Sensitive Grammar)
  • Suppose production B  
    • CFG: we can replace B by , no matter what
    • eg:  B  =>   
    • CSG: we can replace B by  only in certain contexts. Ie, only when B is preceded and/or followed by certain strings
    • eg: c B  d 

Jim Hogg - UW - CSE - P501

example csg
Example CSG

The following CSG generates the language an bncn for n >= 1

  • S  a b c
  • | a S B c
  • c B  W B
  • W B  W X
  • W X  B X
  • B X  B c
  • b B  b b

Note: CSGs will not be discussed further, nor examined, as part of P501

Jim Hogg - UW - CSE - P501

parsing1
Parsing
  • The syntax of most programming languages can be specified by a Context-Free Grammar or CFG
  • Parsing = "How to fill the gap between Start Symbol and Sentence"
  • L(G) = the language generated by G = the set of sentences generated by G
  • Parsing: Given G and a sentencewin L(G ), construct the derivation, (parse tree) for w in some order
  • As we parse, do something useful at each node in the tree

Jim Hogg - UW - CSE - P501

in some order
"in some order"
  • Top-down
    • Start with the root
    • Traverse the parse tree depth-first; scan tokens, Left-to-right; create a Leftmost derivation
    • LL(k)
  • Bottom-up
    • Start at leaves and build up to the root; scan tokens, Left-to-right; create a Rightmost derivation (in reverse)
    • LR(k) and subsets: LALR(k) and SLR(k)

Jim Hogg - UW - CSE - P501

do something useful at each node
"do something useful at each node"
  • Perform some semantic action:
    • Construct nodes of full parse tree (rare)
    • Construct abstract syntax tree (common)
    • Construct linear, lower-level representation
      • like assembler code
    • Generate target code on the fly
      • 1-pass compiler
      • not common in production compilers: poor code quality

Jim Hogg - UW - CSE - P501

context free grammars formal description
Context-Free Grammars – Formal Description
  • A grammar G is a tuple <N, T, P, S> where
    • N a finite set of Non-terminal symbols
    • T a finite set of Terminal symbols
    • P a finite set of Productions
      • A subset of N × (N  T) *
    • S the start symbol, a distinguished element of N
      • If not specified otherwise, this is taken as the Non-Terminal on the LHS of the first production

Jim Hogg - UW - CSE - P501

standard notations
Standard Notations
  • a, b, c element of T
  • A, B, C element of N
  • w, x, y, z elements of T*
  • X, Y, Z element of N T
  • , ,  elements of (N  T)*
  • (A, ) P => A 

Jim Hogg - UW - CSE - P501

derivation relations 1
Derivation Relations (1)
  • if B P then B => 
    • simply affirms G is context-free
  • A =>* 
    • denotes there is a chain of zero-or-more productions, starting with A, that generates 
    • transitive closure

Jim Hogg - UW - CSE - P501

derivation relations 2
Derivation Relations (2)
  • if B P then w B =>lm w 
    • derives leftmost
    • prefix of A is all terminals (by construction)
  • if B P then B w =>rm w
    • derives rightmost
    • prefix of A may include terminals and non-terminals
  • We will only be interested in leftmost and rightmost derivations – not random orderings

Jim Hogg - UW - CSE - P501

languages
Languages
  • All the sentences (strings of Terminals) I can generate from NonTerminal A:
  • For A  N, L(A) = { w | A =>* w }
  • All the sentences (strings of Terminal) I can generate from start symbol S:
  • If S is the start symbol of grammar G, define L(G ) = L(S)

Jim Hogg - UW - CSE - P501

reduced grammars
Reduced Grammars
  • Grammar G is reduced iff for every productionA in P there is some derivation

S =>* x A z => x  z =>* xyz

    • ie, no production is useless
  • Convention: we will use only reduced grammars

Jim Hogg - UW - CSE - P501

ambiguous grammars
Ambiguous Grammars
  • Grammar G is unambiguous iff every w in L(G ) has a unique leftmost (or rightmost) derivation
    • Fact: unique leftmost or unique rightmost implies the other
  • A grammar lacking this property is ambiguous
    • Note: other grammars that generate the same language may be unambiguous
    • So, "ambiguous" applies to a grammar – not a language
  • We need unambiguous grammars for parsing (well mostly: see later)

Jim Hogg - UW - CSE - P501

example ambiguous grammar
Example: Ambiguous Grammar

ExpExp Op Exp | Dig

Op + | - | * | /

Dig  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

  • Exercise: show that this is ambiguous
    • How? Show two different leftmost or rightmost derivations for the same string
    • Equivalently: show two different parse trees for the same string

Jim Hogg - UW - CSE - P501

2 3 4 part 1

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

2+3*4 – part 1

Give a leftmost derivation of 2+3*4; show the parse tree

Exp

Exp

Jim Hogg - UW - CSE - P501

2 3 4 part 2

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

2+3*4 – part 2

Exp

Exp

=> Exp+ Exp

Exp

Exp

+

Jim Hogg - UW - CSE - P501

2 3 4 part 3

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

2+3*4 – part 3

Exp

Exp

Exp => Exp+ Exp

=> Dig + Exp

Exp

Exp

Dig

+

Jim Hogg - UW - CSE - P501

2 3 4 part 4

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

2+3*4 – part 4

Exp

Exp

Exp => Exp+Exp

=> Dig + Exp

=> 2 + Exp

Exp

Exp

Dig

+

2

Jim Hogg - UW - CSE - P501

2 3 4 part 5

Exp ::= Exp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig::= [0-9]

2+3*4 – part 5

Exp

Exp

Exp => Exp+ Exp

=> Dig + Exp

=> 2 +Exp

=> 2 + Exp * Exp

Exp

Exp

Exp

Exp

Dig

2

+

*

Jim Hogg - UW - CSE - P501

2 3 4 part 6

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

2+3*4 – part 6

Exp

Exp

Exp => Exp+ Exp

=> Dig + Exp

=> 2 +Exp

=> 2 + Exp * Exp

=> 2 + Dig * Exp

Exp

Exp

Exp

Exp

Dig

Dig

2

*

+

Jim Hogg - UW - CSE - P501

2 3 4 part 7

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

2+3*4 – part 7

Exp

Exp

Exp => Exp+ Exp

=> Dig + Exp

=> 2 +Exp

=> 2 + Exp* Exp

=> 2 + Dig * Exp

=> 2 + 3 * Exp

Exp

Exp

Exp

Exp

Dig

Dig

2

+

3

*

Jim Hogg - UW - CSE - P501

2 3 4 part 8

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

2+3*4 – part 8

Exp

Exp

Exp => Exp+ Exp

=> Dig + Exp

=> 2 +Exp

=> 2 + Exp* Exp

=> 2 + Dig * Exp

=> 2 + 3 * Exp

=> 2 + 3 * Dig

Exp

Exp

Exp

Exp

Dig

Dig

Dig

2

3

*

+

Jim Hogg - UW - CSE - P501

2 3 4 part 9

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

2+3*4 – part 9

Exp

Exp

Exp => ExpOp Exp

=> Dig Op Exp

=> 2 OpExp

=> 2 + Exp

=> 2 + ExpOp Exp

=> 2 + Dig Op Exp

=> 2 + 3 OpExp

=> 2 + 3 * Exp

=> 2 + 3 * Dig

=> 2 + 3 * 4

Exp

Exp

Exp

Op

Exp

Dig

Dig

Dig

2

3

*

4

+

Jim Hogg - UW - CSE - P501

2 3 4 part 10

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

2+3*4 – part 10

Give a different leftmost derivation of 2+3*4

Exp

Exp

=> Exp * Exp

=> Exp + Exp * Exp

=> 2 + Exp * Exp

=> 2 + 3 *Exp

=> 2 + 3 * 4

Exp

Exp

Exp

Exp

Dig

Dig

Dig

4

2

+

3

*

Jim Hogg - UW - CSE - P501

are derivations equivalent
Are derivations equivalent?

*

+

4

*

+

2

3

4

2

3

Result = 2 + (3 * 4) = 14

Result = (2 + 3) * 4 = 20

Jim Hogg - UW - CSE - P501

another example

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

Another example
  • Give two different derivations of 5 – 6 – 7

Jim Hogg - UW - CSE - P501

another example1

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

Another example

-

result = 6

Give two different rightmost derivations of 5 – 6 – 7

Exp => Exp - Exp

=> Exp - Exp - Exp

=> Exp - Exp- 7

=> Exp- 6 - 7

=> 5 - 6 - 7

5

-

6

7

Exp => Exp- Exp

=> Exp- 7

=> Exp - Exp- 7

=> Exp- 6 - 7

=> 5 - 6 - 7

result = -8

-

7

-

6

5

Jim Hogg - UW - CSE - P501

another example2

ExpExp + Exp | Exp – Exp | Exp * Exp

| Exp / Exp | Dig

Dig [0-9]

Another example

-

result = 6

Give two different leftmost derivations of 5 – 6 – 7

Exp => Exp- Exp

=> 5 - Exp

=> 5 - Exp- Exp

=> 5 - 6 - Exp

=> 5 - 6 - 7

5

-

6

7

result = -8

-

Exp => Exp- Exp

=> Exp- Exp - Exp

=> 5 - Exp - Exp

=> 5 - 6 - Exp

=> 5 - 6 - 7

7

-

6

5

Jim Hogg - UW - CSE - P501

what went wrong
What went wrong?
  • Grammar did not capture precedence or associativity
    • Eg: 2 + (3 * 4) = 14 versus (2 + 3) * 4 = 20
    • Eg: 5 - (6 - 7) = 6 versus (5 - 6) - 7 = -8
  • Solution
    • Create a non-terminal for each level of precedence
    • Isolate the corresponding part of the grammar
    • Force the parser to recognize higher precedence sub-expressions first

Jim Hogg - UW - CSE - P501

classic expression grammar
Classic Expression Grammar

expexp + term | exp – term | term

term  term * factor | term / factor | factor

factor int | ( exp )

int 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7

E  E + T | E – T | T

T  T * F | T / F | F

F  ( E ) | D

D [0-9]

Jim Hogg - UW - CSE - P501

derive 2 3 4

E  E + T | E – T | T

T  T * F | T / F | F

F  ( E ) | D

D [0-9]

Derive 2 + 3 * 4

E => E + T

=> E + T * F

=> E + T * D

=> E + T * 4

=> E + F * 4

=> E + D * 4

=> E + 3 * 4

=> T + 3 * 4

=> F + 3 * 4

=> D + 3 * 4

=> 2 + 3 * 4

+

*

2

4

3

Result = 2 + (3 * 4) = 14

This grammar yields the correct, expected (school algebra) result

Jim Hogg - UW - CSE - P501

derive 5 6 7

E  E + T | E – T | T

T  T * F | T / F | F

F  ( E ) | D

D [0-9]

Derive 5 - 6 - 7

E => E - T

=> E - F

=> E - D

=> E - 7

=> E - T - 7

=> E - F - 7

=> E - D - 7

=> E - 6 - 7

=> F - 6 - 7

=> D - 6 - 7

=> 5 - 6 - 7

result = -8

-

7

-

6

5

  • This grammar yields the correct, expected (school algebra) result
  • Note how left-recursive rules yield left-associativity

Jim Hogg - UW - CSE - P501

classic example of ambiguous grammar
Classic Example of Ambiguous Grammar
  • Grammar for conditional statements

stm if ( cond ) stm

| if ( cond ) stm else stm

    • Exercise: show that this is ambiguous
      • How?

“The Dangling Else” - a 'weakness' in C, Pascal, etc

Jim Hogg - UW - CSE - P501

two derivations
Two Derivations

stmif ( cond ) stm

| if ( cond ) stmelse stm

stm

if

cond

)

stm

(

if (cond) if (cond) stm else stm

if

stm

else

)

cond

(

stm

stm

if

stm

else

)

cond

(

stm

if

cond

)

stm

(

Jim Hogg - UW - CSE - P501

solving the dangling else
Solving the Dangling Else
  • Fix the grammar to separate if statements with else clause from those without
    • Done in Java reference grammar
    • Adds lots of non-terminals
  • Use some ad-hoc rule in parser
    • “else matches closest unpaired if”
  • Change the language
    • Only possible if you 'own' the language

Jim Hogg - UW - CSE - P501

resolving ambiguity with grammar 1
Resolving Ambiguity with Grammar (1)

StmIfElse | IfNoElse

IfElse if ( Exp ) IfElse else IfElse

IfNoElse if ( Exp ) Stm

| if ( Exp ) IfElse else IfNoElse

  • formal, no additional rules beyond syntax
  • sometimes obscures original grammar

Jim Hogg - UW - CSE - P501

resolving ambiguity with grammar 2
Resolving Ambiguity with Grammar (2)
  • If you can (re-)design the language, avoid the problem entirely

IfStm if Exp then Stm end

| if Exp then Stm else Stm end

    • formal, clear, elegant
    • allows sequence of Stms in then and else branches, no { } needed
    • extra end required for every if

Jim Hogg - UW - CSE - P501

parser tools and operators
Parser Tools and Operators
  • Most parser tools cope with ambiguous grammars
    • Earlier productions chosen before later ones
    • Longest match used if there is a choice
    • Makes life simpler if used with discipline
    • But be sure the tool does what you really want
  • Specify operator precedence & associativity
    • Allows simpler, ambiguous grammar with fewer non-terminals
    • Used in CUP

Jim Hogg - UW - CSE - P501

slide50
Next
  • Next
    • LR (bottom-up / shift-reduce) parsing
  • Reading
    • Continue Cooper&Torczon chapter 3
  • Note
    • Note: LR parsing is the toughest session in P501

Jim Hogg - UW - CSE - P501