One pass compiler Compiler Design

One pass compilerCompiler Design CSC532

Symbol Table • Stores the symbol of the source program as the compiler encounters them. • Each entry contains the symbol name plus a number of parameters describing what is known about the symbol • Reserved words (if, then, else, etc.) maybe stored in the symbol table as well.

Symbol Table • As a minimum we must be able to • INSERT a new symbol into the table • RETRIEVE a symbol so that its parameters maybe retrieved and/or modified, • Query to find out if a symbol is already in the table. • Each entry can be implemented as a record. Records can have different formats (Variant records in Pascal).

Storing characters • Method 1: A fixed size space within each entry large enough to hold the largest possible name. Most names will be much shorter than this so there will be a lot of wasted storage • Method 2: Store all symbols in one large separate array. Each symbol is terminated with an end of symbol mark (EOS). Each symbol table record contains a pointer to the first character of the symbol. • Method n: modern languages (e.g. Java, C++ std components) has efficient DS, e.g. string or vector

Symbol Table Data Structure • One Linear list: • Easy to implement • search time will be very long if source has many symbols.

Symbol Table Data Structure Hash table: • Run the symbol name through a hash function to create an index in a table. • If some other symbol has already claimed the space then rehash with another hash function to get another index, etc. • Hash Table must be large enough to accommodate largest number of symbols.

Symbol Table Data Structure • Open hash: • Store the entries in a number of linear lists ( called Buckets). • Use a hash function on the symbol name to determine which lists to use. • A good hash function will spread the symbols across the buckets, so each linear list will be short.

Hash Functions • Goal is to get a hash function that generates a different index for each symbol name in the source. Index = f (string) • Some programmers use symbols like tmp1. tmp2, tmp3..so the hash function should use the last character of the name.

Hash Functions(continued) Other programmers use symbols like xvel, yvel, zvel..so the hash function should use the first character of the name. Best if all characters in the name are used. Characters should be given different weights so x2y2z, y2x2z, z2y2x…are hashed differently. Modern languages have hash functions/objects

Phases of A Compiler

CONTD. • Example source statement: • position := initial + rate * 60 • After lexical analysis: • id1 := id2 + id3 * 60 and three symbols are entered in the symbol table: • position • initial • Rate • After syntax analysis:

CONTD. After syntax analysis: := + id1 * id2 60 id3

CONTD. • After semantic analysis: := + id1 * id2 id3 inttoreal 60

CONTD. • After intermediate code generation: temp1 := inttoreal(60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3 • After code optimization: temp1 := id3 * 60.0 id1 := id2 + temp1 • After final code generation: MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2,R1 MOVF R1, id1

Some Definitions • Lexeme: The character sequence forming a token. Examples: :=, * ,+, rate ,60 • Syntax: What programs look like. • Semantics: What programs mean.

Context Free Grammar • Specifying the syntax of a language. • Also known as Backus-Naur Form or BNF. list  list + digit By itself is not CFG

Context Free Grammar • Example: In C an if-else statement looks like: if (expression) statement else statement The statement is a concatenation of 7 elements: • the keyword if, • opening parenthesis, • an expression, • a closing parenthesis, • a statement, • the keyword else, • a statement.

CFG • We write this as a production: stmt  if (expr) stmt else stmt where • stmt denotes a statement, • expr denotes an expression, • the arrow “” is read as “can have the form” • The tokens in this production are: if , else, () • The variables are stmt and expr. (Non-terminal) • Variables are sequences of tokens and are called non-terminals.

CFG: Notation • A context free grammar has 4 components: • A set of tokens known as terminal symbols • A set of non terminals • A set of productions • A non terminal designated as a start symbol.

CFG: Example • The productions are: list  list + digit list  list – digit list  digit digit  0|1|2|3|4|5|6|7|8|9

CFG: Example • The vertical lines in the last production mean “or”. A digit can have the form of 0 or 1 or 2, etc. The first three productions can be combined: • list  list + digit | list – digit | digit • The tokens (terminals) of this grammar are: + - 0 1 2 3 4 5 6 7 8 9 • The non terminals are list and digit, with list being the starting non-terminal because its productions are written first. • What is 9-5+2 ?

This is the parse tree for 9-5+2 list list digit digit list digit + - 2 9 5

CFG: Another example block  begin opt_stmts end opt_stmts  stmts_list| Є stmts_list  stmts_list;stmt| stmt WhereЄ = empty string of symbols.

CFG: Another example • Ambiguity. • Consider a grammar with a single production: string  string + string | string - string | 0|1|2|3|4|5|6|7|8|9 • string like 9-5+2 will have two parse trees:

9-5+2 will have two parse trees string string string + string string - string string 2 9 string - string string + 5 2 9 5

Ambiguity • The left parse tree parses the expression as though it were written (9-5) +2 which equals 6. • The right parse tree parses the expression as though it were written 9- (5+2) which equals 2. • It is important to have only one parse tree for any string of symbols. The grammar should be unambiguous.

Ambiguity Reduction • Associativity of operators: • Precedence of operators: • Syntax for arithmetic expressions: Assume the basic units are digits and parenthesized expressions. • Factor  digit | (expr)

Associativity of operators: • In most languages addition, subtraction, multiplication and division are left associative. • Exponentiation is usually right associative. • In C the assignment operator, = , is right associative. A = b = c is treated like a = (b = c).

Precedence of operators: • Usually multiplication and division have higher precedence than addition and subtraction. • An expression like 9+5*2 • 9+(5*2), not (9+5) * 2.

Syntax for arithmetic expressions • The binary operators * and / have highest precedence. They are left associative. term  term * factor|term/factor|factor • Terms are combined with + and -: Therefore the resultant grammar is: expr  expr + term | expr – term| term termterm * factor|term/factor| factor factordigit|(expr) digit  0|1|2|3|4|5|6|7|8|9

STOP here

Syntax of our Source Language • program  program id (identifier_list); declarations subprogram_declarations compound_statement • identifier_list  id|identifier_list, id • declarations  declarations var identifier_list:type;|e • type  standard_type|array[num..num] of standard_type • standard_type  integer|real

subprogram_declarations  subprogram_declarations subprogram_declaration;|e • subprogram_declaration  subprogram_head declarations compound_statement • subprogram_head  function id arguments : standard_type;|procedure id arguments; • arguments  (parameter_list)|e • parameter_list  identifier_list : type | parameter_list ;identifier_list : type • Compound_statement  begin optional_statements end • optional_statements  statement_list | e • statement_list  statement | statement_list ; statement

Syntax – Directed Translation • Associate a set of attributes with each grammar symbol. With each production associate a set of semantic rules for computing values of the attributes. • Synthesized attribute: The value of the attribute at any node of a parse tree can br computed from the attribute values of the children at the node. • Can be evaluated by a single bottom – up traversal of the parse tree.

SDT (continued) • Example : Translating infix notation to postfix notation. If a node in the parse tree is labeled with X then let X.t be a string – valued attribute associated with the node. X.t || Y.t means concatenate X.t with Y.t

PRODUCTION expr  expr1 + term expr  expr1 – term expr  term term  0 term  1 …… term  9 SEMANTIC RULE expr.t := expr1.t || term.t || ‘+’ expr.t := expr1.t || term.t || ‘-’ expr.t := term.t term.t := ‘0’ term.t := ‘1’ ….. term.t := ‘9’ Syntax Directed Definition

Attribute Values at Nodes in Parse Tree expr.t = 95-2+ expr.t = 95- term.t = 2 expr.t = 9 term.t = 5 term.t = 9 2 9 - 5 +

PRODUCTION seq  begin seq  seq1 instr instr  east instr  north instr  west instr  south SEMANTIC RULES seq.x := 0 seq.y := 0 seq.x := seq1.x + instr.dx seq.y := seq1.y + instr.dy instr.dx := 1 instr.dy := 0 instr.dx := 0 instr.dy := 1 instr.dx := -1 instr.dy := 0 instr.dx := 0 instr.dy := -1 Example : Robot

seq.x = -1 seq.y = -1 seq.x = -1 seq.y = 0 instr.dx = 0 instr.dy = -1 seq.x = 0 seq.y= 0 instr.dx = -1 instr.dy = 0 begin south west

Translation Schemes • Translation scheme: A context-free grammar with semantic actions embedded within the right sides of the productions. • Example : rest  + term {print (‘+’)} rest1 • The semantic action is enclosed within braces. The production itself is : rest  + term rest1 • Parse tree: Do a post order traversal of the tree. After the + and term leaves are traversed, the {print (‘+’)} leaf is traversed and the semantic action is performed, then the rest1leaf is traversed and then the root, rest is visited.

In a simple syntax-directed definition the translation order of the non terminals on the right sides is the same as their order in the productions. These definitions can be implemented with translation schemes. rest rest1 term {print (‘+’)} +

Example: Translating into Post-fix Form • expr  expr + term {print (‘+’)} • expr  expr - term {print (‘-’)} • expr  term • term  0 {print (‘0’)} • term  1 {print (‘1’)} • …… • term  9 {print (‘9’)}

Parsing • Determines if a string of tokens can be generated by a grammar • Parser can be constructed for any grammar • For any context-free grammar there is a parser that takes at most O (n3) time to parse a string of n tokens. • Almost all programming languages that arise in practice can be parsed in O (n) time making a single left-to-right scan of the input looking ahead one token at a time. • Two classes of parsing methods : Top-down – Construct the parse tree starting at the root and working down towards the leaves. Bottom-up – Construct the parse tree starting at the leaves and working up toward the roots. • Efficient top-down parsers easier to construct • Bottom-up parsers handle larger class of grammar and translation schemes.

Top – Down Parsing • Recursive-decent parsing is a top-down method where we execute a set of recursive procedures to process the input. • Predictive parsing – a special case of recursive-decent parsing. - can be used if the scanned input symbol unambiguously determines the production selected for each nonterminal. • Example grammar: type  simple | id |array [simple] of type simple  integer | char | num .. Num

Pseudo Code for Predictive Parser procedure match (t: token); begin if lookahead = t then lookahead := nexttoken else error end; procedure type; begin if lookahead is in {integer, char, num} then simple else if lookahead =‘ ’ then begin match (‘ ’ ); match (id) end else if lookagead = array then begin match(array); match(‘[’); simple; match (‘]’); match (of); type end else error end;

procedure simple; begin if lookahead = integer then match(inteher) else if lookahead = char then match (char) else if lookahead = num then begin match(num); match(..); match(num) end else error end;

No need to backtrack as long as the first tokens on the right sides of the productions are disjoint. • e-productions: If any non terminal has an e-production then treat the e-production last. There is no “else error” at the end of the procedure. • Left-recursion requires special handling. A production like expr  expr + term is left-recursive. If the expr procedure calls itself at the beginning the parser will loop forever. Usually the production can be re-written to make it right-recursive. • Example: expr  expr + term | term produces sequences like:

term term + term term + term + term ….. • The same sequence can be produced with the following grammar: expr  term rest rest  + term rest | e

One pass compiler Compiler Design

One pass compiler Compiler Design

Presentation Transcript

Compiler Design

Compiler Design

Compiler Design

1 Pass Compiler

Compiler Design

Compiler Design

Compiler Design

___________________________________________ COMPILER DESIGN

Compiler design

Simple One-Pass Compiler

Compiler Design

Compiler design

Compiler design

Simple One-Pass Compiler part II

Compiler design

Compiler design

Compiler design

Compiler design

Compiler design