compiler construction in4020 lecture 5 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Compiler construction in4020 – lecture 5 PowerPoint Presentation
Download Presentation
Compiler construction in4020 – lecture 5

Loading in 2 Seconds...

play fullscreen
1 / 47

Compiler construction in4020 – lecture 5 - PowerPoint PPT Presentation


  • 131 Views
  • Uploaded on

Compiler construction in4020 – lecture 5. Koen Langendoen Delft University of Technology The Netherlands. Summary of lecture 4. syntax analysis: tokens  AST bottom-up parsing push-down automaton ACTION/GOTO tables LR(0) NO look-ahead

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Compiler construction in4020 – lecture 5' - sylvia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
compiler construction in4020 lecture 5

Compiler constructionin4020 – lecture 5

Koen Langendoen

Delft University of Technology

The Netherlands

summary of lecture 4
Summary of lecture 4
  • syntax analysis: tokens  AST
  • bottom-up parsing
    • push-down automaton
    • ACTION/GOTO tables
      • LR(0) NO look-ahead
      • SLR(1) one-token look-ahead, FOLLOW sets to solve shift-reduce conflicts
      • LR(1) SLR(1), but FOLLOW set per item
      • LALR(1) LR(1), but “equal” states are merged
slide3
Quiz

2.50 Is the following grammar LR(0), SLR(1), LALR(1), or LR(1) ?

(a) S  x S x | y

(b) S  x S x | x

overview

program text

lexical analysis

tokens

parser

language

syntax analysis

generator

grammar

AST

context handling

annotated AST

Overview
  • semantic analysis
    • identification – symbol tables
    • type checking
  • assignment
    • yacc
    • LLgen
semantic analysis

se-man-tic: of or relating to meaning in language.

Webster’s Dictionary

Semantic analysis
  • information is scattered throughout the program
  • identifiers serve as connectors
  • find defining occurrence of each applied occurrence of an identifier in the AST
    • undefined identifiers  error
    • unused identifiers  warning
  • check rules in the language definition
    • type checking
    • control flow (dead code)
semantic analysis1
Semantic analysis
  • information is scattered throughout the program
  • identifiers serve as connectors
  • find defining occurrence of each applied occurrence of an identifier in the AST
    • undefined identifiers  error
    • unused identifiers  warning
  • check rules in the language definition
    • type checking
    • control flow (dead code)
symbol table
Symbol table

program text

  • global storage used by all compiler phases
  • holds information about identifiers:
    • type
    • location
    • size

s

y

m

b

o

l

t

a

b

l

e

lexical analysis

syntax analysis

context handling

annotated AST

optimizations

code generation

executable

symbol table implementation

next

next

next

bucket 0

name

name

name

bucket 1

type

type

bucket 2

type

bucket 3

Symbol table implementation
  • extensible string-indexable array
    • linear list
    • tree
    • hash table

”aap”

hash function: string  int

”noot”

”mies”

identification
Identification
  • different kinds of identifiers
    • variables
    • type names
    • field selectors
  • name spaces
  • scopes

typedef int i;

int j;

void foo(int j)

{

struct i {i i;} i;

i: i.i = 3;

printf( "%d\n", i.i);

}

identification1
Identification
  • different kinds of identifiers
    • variables
    • type names
    • field selectors
  • name spaces
  • scopes

typedef int i;

int j;

void foo(int j)

{

struct i {i i;} i;

i: i.i = 3;

printf( "%d\n", i.i);

}

handling scopes
Handling scopes
  • stack of scope elements
    • when entering a scope a new element is pushed on the stack
    • declared identifiers are entered in the top scope element
    • applied identifiers are looked up in the scope elements from top to bottom
    • the top element is removed upon scope exit
a scoped hash based symbol table

scope stack

2

1

0

”mies”

”aap”

”noot”

decl

decl

decl

A scoped hash-based symbol table

aap( int noot)

{ int mies, aap;

....

}

1

prop

hash table

bucket 0

2

prop

0

prop

bucket 1

bucket 2

bucket 3

level

2

prop

identification complications
Identification: complications
  • overloading
    • operators: N*2prijs*2.20371
    • functions:PUT(s:STRING) PUT(i:INTEGER)
    • solution: yield set of possibilities (to be constrained by type checking)
  • imported scopes
    • C++ scope resolution operator x::
    • Modula FROM module IMPORT ...
    • solution: stack (or merge) the new scope
type checking
Type checking
  • operators and functions impose restrictions on the types of the arguments
  • types
    • basic types
    • structured types
    • type names

typedef

struct {

double re;

double im;

} complex;

forward declarations
Forward declarations
  • recursive data structures
  • type information must be stored
    • type table
  • type information must be resolved
    • undefined types
    • circularities

TYPE Tree = POINTER TO Node;

Type Node = RECORD

element : Integer;

left, right : Tree;

END RECORD;

type equivalence
Type equivalence
  • name equivalence [all types get a unique name]

VAR a : ARRAY [Integer 1..10] OF Real;

VAR b : ARRAY [Integer 1..10] OF Real;

  • structural equivalence [difficult to check]

TYPE c = RECORD i : Integer; p : POINTER TO c; END RECORD;

TYPE d = RECORD

i : Integer;

p : POINTER TO

RECORD

i : Integer;

p : POINTER to c;

END RECORD;

END RECORD;

coercions
Coercions
  • implicit data and type conversion

to match operand (argument) type

  • coercions complicate identification

(ambiguity)

  • two phase approach
    • expand a type to a set by applying coercions
    • reduce type sets based on constraints imposed by (overloaded) operators and language semantics

VAR a : Real;

...

a := 5;

3.14 + 7

8 + 9

variable value or location

:=

(location of)

p

deref

(location of)

q

Variable: value or location?
  • two usages of variables

rvalue: value

lvalue: location

  • insert coercion to dereference variable
  • checking rules:

VAR p : Real;

VAR q : Real;

...

p := q;

exercise 5 min
Exercise (5 min.)

complete the table

answers
Answers

complete table

assignment practicum
Assignment (practicum)

Asterix compiler

1) replace yacc by LLgen

2) make Asterix object-oriented

  • classes and objects
  • inheritance and dynamic binding

Asterix program

token

description

lex

lexical analysis

Asterix

grammar

yacc

syntax analysis

context handling

code generation

C-code

y et a nother c ompiler c ompiler

tokens + properties

grammar rules + actions

auxiliary C-code

Yet another compiler compiler
  • yacc (bison):parser generator for UNIX
    • LALR(1) grammar C code
  • format of the yacc input file:

definitions

%%

rules

%%

user code

yacc based expression interpreter
Yacc-basedexpression interpreter
  • input file
  • yacc maintains a stack of “values” that may be referenced ($i) in the semantic actions

%token DIGIT

%%

line: expr '\n' { printf("%d\n", $1);}

;

expr : expr '+' expr { $$ = $1 + $3;}

| expr '*' expr { $$ = $1 * $3;}

| '(' expr ')' { $$ = $2;}

| DIGIT

;

%%

grammar

semantics

yacc interface to lexical analyzer
Yacc interface to lexical analyzer
  • yacc invokes yylex() to get the next token
  • the “value” of a token must be stored in the global variable yylval
  • the default value type is int, but can be changed

%%

yylex()

{

int c;

c = getchar();

if (isdigit(c)) {

yylval = c - '0';

return DIGIT;

}

return c;

}

yacc interface to back end
Yacc interface to back-end
  • yacc generates a function named yyparse()
  • syntax errors are reported by invoking a callback function yyerror()

%%

yylex()

{

...

}

main()

{

yyparse();

}

yyerror()

{

printf("syntax error\n");

exit(1);

}

yacc based expression interpreter1
Yacc-basedexpression interpreter
  • input file (desk0)
  • run yacc

%%

line: expr '\n' { printf("%d\n", $1);}

;

expr : expr '+' expr { $$ = $1 + $3;}

| expr '*' expr { $$ = $1 * $3;}

| '(' expr ')' { $$ = $2;}

| DIGIT

;

%%

> make desk0

bison -v desk0.y

desk0.y contains 4 shift/reduce conflicts.

gcc -o desk0 desk0.tab.c

>

conflict resolution in yacc
Conflict resolution in Yacc
  • shift-reduce: prefer shift
  • reduce-reduce: prefer the rule that comes first
yacc based expression interpreter2

NO

> desk0

2*3+4

14

Yacc-basedexpression interpreter
  • input file

(desk0)

  • run yacc
  • run desk0, is it correct?

%%

line: expr '\n' { printf("%d\n", $1);}

;

expr : expr '+' expr { $$ = $1 + $3;}

| expr '*' expr { $$ = $1 * $3;}

| '(' expr ')' { $$ = $2;}

| DIGIT

;

%%

operator precedence in yacc
Operator precedence in Yacc

priority from

top (low) to

bottom (high)

%token DIGIT

%left '+'

%left '*'

%%

line: expr '\n' { printf("%d\n", $1);}

;

expr : expr '+' expr { $$ = $1 + $3;}

| expr '*' expr { $$ = $1 * $3;}

| '(' expr ')' { $$ = $2;}

| DIGIT

;

%%

exercise 7 min
Exercise (7 min.)

multiple lines:

%%

lines: line

| lines line

;

line: expr '\n' { printf("%d\n", $1);}

;

expr : expr '+' expr { $$ = $1 + $3;}

| expr '*' expr { $$ = $1 * $3;}

| '(' expr ')' { $$ = $2;}

| DIGIT

;

%%

Extend the interpreter to a desk calculator with

registers named a – z. Example input: v=3*(w+4)

answers2
Answers

%{

int reg[26];

%}

%token DIGIT

%token REG

%right '='

%left '+'

%left '*'

%%

expr : REG '=' expr { $$ = reg[$1] = $3;}

| expr '+' expr { $$ = $1 + $3;}

| expr '*' expr { $$ = $1 * $3;}

| '(' expr ')' { $$ = $2;}

| REG{ $$ = reg[$1];}

| DIGIT

;

%%

answers3
Answers

%%

yylex()

{int c = getchar();

if (isdigit(c)) {

yylval = c - '0';

return DIGIT;

} else if ('a' <= c && c <= 'z') {

yylval = c - 'a';

return REG;

}

return c;

}

llgen ll 1 parser generator
LLgen: LL(1) parser generator
  • LLgen is part of the Amsterdam Compiler Kit
  • takes LL(1) grammar + semantic actions in C and generates a recursive descent parser
  • LLgen features:
    • repetition operators
    • advanced error handling
    • parameter passing
    • control over semantic actions
    • dynamic conflict resolvers
llgen example expression interpreter
LLgen example:expression interpreter
  • start from LR(1) grammar
  • make grammar LL(1)
  • use repetition operators

lines : line

| lines line

;

line: expr '\n'

;

expr : expr '+' expr

| expr '*' expr

| '(' expr ')‘

| DIGIT

;

  • left recursion
  • operator precedence

yacc

llgen example expression interpreter1
LLgen example:expression interpreter
  • start from LR(1) grammar
  • make grammar LL(1)
  • use repetition operators

%token DIGIT;

main : [line]+

;

line : expr '\n'

;

expr : term [ '+' term ]*

;

term : factor [ '*' factor ]*

;

factor : '(' expr ')‘

| DIGIT

;

  • left recursion
  • operator precedence
  • add semantic actions
    • attach parameters to grammar rules
    • insert C-code between the symbols

LLgen

slide38

grammar

semantics

main : [line]+

;

line {int e;}

: expr(&e) '\n' { printf("%d\n", e);}

;

expr(int *e) {int t;}

: term(e)

[ '+' term(&t) { *e += t;}

]*

;

term(int *t) {int f;}

: factor(t)

[ '*' factor(&f) { *t *= f;}

]*

;

factor(int *f)

: '(' expr(f) ')'

| DIGIT { *f = yylval;}

;

slide39

grammar

semantics

values/results passed as parameters

main : [line]+

;

line {int e;}

: expr(&e) '\n' { printf("%d\n", e);}

;

expr(int *e) {int t;}

: term(e)

[ '+' term(&t) { *e += t;}

]*

;

term(int *t) {int f;}

: factor(t)

[ '*' factor(&f) { *t *= f;}

]*

;

factor(int *f)

: '(' expr(f) ')'

| DIGIT { *f = yylval;}

;

slide40

grammar

semantics

semantic actions: C code between {}

main : [line]+

;

line {int e;}

: expr(&e) '\n' { printf("%d\n", e);}

;

expr(int *e) {int t;}

: term(e)

[ '+' term(&t) { *e += t;}

]*

;

term(int *t) {int f;}

: factor(t)

[ '*' factor(&f) { *t *= f;}

]*

;

factor(int *f)

: '(' expr(f) ')'

| DIGIT { *f = yylval;}

;

llgen interface to lexical analyzer
LLgen interface to lexical analyzer
  • by default LLgen invokes yylex() to get the next token
  • the “value” of a token can be stored in any global variable (yylval) of any type (int)

yylex()

{

int c;

c = getchar();

if (isdigit(c)) {

yylval = c - '0';

return DIGIT;

}

return c;

}

llgen interface to back end
LLgen interface to back-end
  • LLgen generates a user-named function (parse)
  • LLgen handles syntax errors by inserting missing tokens and deleting unexpected tokens
  • LLmessage() is invoked to notify the lexical analyzer

%start parse, main;

LLmessage(int class)

{

switch (class) {

case -1:

printf("expecting EOF, ");

case 0:

printf("deleting token (%d)\n",LLsymb);

break;

default:

/* push back token LLsymb */

printf("inserting token (%d)\n",class);

break;

}

}

exercise 5 min1
Exercise (5 min.)
  • extend LLgen-based interpreter to a desk calculator with registers named a – z. Example input: v=3*(w+4)
answers5
Answers

%token REG;

expr(int *e) {int r,t;}

: %if (ahead() == '=')

reg(&r) '=' expr(e) { reg[r] = *e;}

| term(e)

[ '+' term(&t) { *e += t;}

]*

;

factor(int *f) {int r;}

: '(' expr(f) ')'

| DIGIT { *f = yylval;}

| reg(&r) { *f = reg[r];}

;

reg(int *r)

: REG { *r = yylval;}

;

answers6

dynamic conflict

resolution

Answers

%token REG;

expr(int *e) {int r,t;}

: %if (ahead() == '=')

reg(&r) '=' expr(e) { reg[r] = *e;}

| term(e)

[ '+' term(&t) { *e += t;}

]*

;

factor(int *f) {int r;}

: '(' expr(f) ')'

| DIGIT { *f = yylval;}

| reg(&r) { *f = reg[r];}

;

reg(int *r)

: REG { *r = yylval;}

;

homework
Homework
  • study sections:
    • 2.2.4.6 LLgen
    • 2.2.5.9yacc
  • assignment 1:
    • replace yacc with LLgen
    • deadline April 9 08:59
  • print handout for next week [blackboard]