Syntax Directed Translation

Syntax Directed Translation CMSC 431 Shon Vick

Phases of a Compiler • 1. Lexical Analyzer (Scanner) • Takes source Program and Converts into tokens • 2. Syntax Analyzer (Parser) • Takes tokens and constructs a parse tree. • 3. Semantic Analyzer • Takes a parse tree and constructs an abstract syntax tree with attributes.

Phases of a Compiler- Contd Syntax Directed Translation • 4. Takes an abstract syntax tree and produces an Interpreter code (Translation output) • 5. Intermediate-code Generator • Takes an abstract syntax tree and produces un- optimized Intermediate code.

Motivation: Parser as Translator Syntax-directed translation Parser ASTs, byte code assembly code, etc Stream of tokens Syntax + translation rules (often hardcoded in the parser)

Important • Syntax directed translation: attaching actions to the grammar rules (productions). • The actions are executed during the compilation (not during the generation of the compiler, not during run time of the program!). Either when replacing a nonterminal with its rhs (LL, top-down) or a handle with a nonterminal (LR, bottom-up). • The compiler-compiler generates a parser which knows how to parse the program (LR,LL). The actions are “implanted” in the parser and are executed according to the parsing mechanism.

Example :Expressions • E  E + T • E  T • T  T * F • T  F • F  ( E ) • F  num

Synthesized Attributes • The attribute value of the terminal at the left hand side of a grammar rule depends on the values of the attributes on the right hand side. • Typical for LR (bottom up) parsing. • Example: TT*F {$$.val=$1.val$3.val}. T.val T.val F.val

Example :Expressions In LEX • E  E + T {$$.val:=$1.val+$3.val;} • E  T {$$.val:=$1.val;} • T  T * F {$$.val:=$1.val*$3.val;} • T  F {$$.val:=$1.val;} • F  ( E ) {$$.val:=$2.val;} • F  num {$1.val:=$1.val;}

Example 2:Type definitions • D  T L • T  int • T  real • L  id , L • L  id

Inherited attributes • The value of the attributes of one of the symbols to the right of the grammar rule depends on the attributes of the other symbols (left or right). • Typical for LL parsing (top down). • D  T {$2.type:=$1.type} L • L  id , {$3.type:=$1.type} L D.type L.type T.type L.type id , L.type

Type definitions • D  T {$2.type:=$1.type} L • T  int {$$.type:=int;} • T  real {$$:=real;} • L  id , L {gen(id.name,$$.type); • $3.type:=$$.type;} • L  id {gen(id.name,$$.type); } T.type int

Type definitions: LL(1) • D  T {$2.type:=$1.type} L • T  int {$$.type:=int;} • T  real {$$:=real;} • L  id {gen(id.name,$$.type); • $2.type:=$$.type;} R • R  , id {gen(id.name,$$.type); } • R   T.type int

How to arrange things for LL(1) on stack? • Include on the stack, except for the grammar symbol also the actions, and a shadow copy for each nonterminal. • Each time one sees an action on the stack, execute it. • Shadow copies are used to get synthesized values and pass them further to the right of the rule.

UMBC LR parser • Given the current state on top and current token, consult the action table. • Either, shift, i.e., read a new token, put in stack, and push new state, or • or Reduce, i.e., remove some elements from stack, and given the newly exposed top of stack and current token, to be put on top of stack, consult the goto table about new state on top of stack. a + b $ LR(k) parser sn X0 sn-1 Xn-1 s0 action goto

LR parser adapted. a + b $ Attributes • Same as before, plus: • Whenever reduce step, execute the action associated with grammar rule.If left-to right inherited attributes exist, can also execute actions in middle of rule. • Can put record of attributes, associated with a grammar symbol, on stack. LR(k) parser sn X0 sn-1 Xn-1 s0 action goto

a + b $ LL(k) parser Y X Z $ Parsing table LL parser • If top symbol X a terminal, must match current token m. • If not, pop top of stack. Then look at table T[X, m] and push grammar rule there in reverse order.

UMBC LL parser Adapted • If top symbol X a terminal, must match current token m. • Put actions into stack as part of rules. Hold for each nonterminal a record with attributes. • If nonterminal, replace top of stack with shadow copy. Then look at table T[X, m] and push grammar rule there in reverse order. • If shadow copy, remove. This way nonterminal can deliver values down and up. a + b $ Attributes LL(k) parser Y X Z $ Actions Parsing table

“Programming ideas” How to annotate? • D  L : T • T  int | char • L  L , id | id D L : T int L , id id

“Programming ideas” How to annotate (LR, i.e., bottom-up style)? • D  L : T {forall i in $1.list do allocate($3.type, i);} • T  int {$$.type:=int;} • T  char {$$.type:=char;} • L  L , id {$$.list:=$1.list{$3.name};} • L  id {$$.list:={$1.name};}

“Programming ideas” Change the grammar • D  id L • T  int | char • L  , id L | : T D id L , id L : T int

“Programming ideas” Change the grammar • D  id L {allocate($2.type,$1.name);} • T  int {$$.type:=int;} • T  char {$$.type:=char;} • L  , id L {$$.type:=$3.type; allocate($3.type,$2.name);} • L  : T {$$.type:=$2.type;}

E  E + T E  T T  T * F T  F F  ( E ) F  num E  T E’ E’  + T E’ E’   T  F T’ T’  * F T’ T’   F  ( E ) F  num Expressions in LL:Eliminating left recursion

E T E’  F T’ ( E F T’ ) *  4 T E’ + T E’ F T’   2 F T’  UMBC (2+3)*4 • E  T E’ • E’  + T E’ • E’   • T  F T’ • T’  * F T’ • T’   • F  ( E ) • F  num 3

Actions in LL E • E  T {$2.down:=$1.up;} E’ {$$.up:=$2.up;} • E’  + T {$3.down:=$$.down+$2.up;} E’ {$$.up:=$3.up;} • E’   {$$.up:=$$.down;} • T  F {$2.down:=$1.up;} T’ {$$.up:=$2.up;} • T’  * F {$3.down:=$$.down+$2.up;} T’ {$$.up:=$3.down;} • T’   {$$.up:=$$.down;} • F  ( E ) {$$.up:=$2.up;} • F  num {$$.up:=$1.up;} T E’  F T’ ( E F T’ ) * 4 T E’  + T E’ F T’ 2 F T’   3 

Syntax Directed Translation Scheme • A syntax directed translation scheme is a syntax directed definition in which the net effect of semantic actions is to print out a translation of the input to a desired output form. • This is accomplished by including “emit” statements in semantic actions that write out text fragments of the output, as well as string-valued attributes that compute text fragments to be fed into emit statements.

Examples • 1. Postfix and Prefix notations: • We have already seen how to generate them. • Let us generate Java Byte code. • E -> E’+’ E { emit(“iadd”);} • E-> E ‘* ‘ E { emit(“imul”);} • E-> T • T -> ICONST { emit(“sipush ICONST.string);} • T-> ‘(‘ E ‘)’

Abstract Stack Machine • We now present (from Java Virtual Machine Spec see http://java.sun.com/docs/books/vmspec/2nd-edition/html/VMSpecTOC.doc.html) a simple stack machine and illustrate how to generate code for it via syntax-directed translations. • The abstract machine code for an expression simulates a stack evaluation of the postfix representation for the expression. Expression evaluation proceeds by processing the postfix representation from left to right.

Evaluation • 1. Pushing each operand onto the stack when encountered. • 2. Evaluating a k-ary operator by using the value located k-1 positions below the top of the stack as the leftmost operand, and so on, till the value on the top of the stack is used as the rightmost operand. • 3. After the evaluation, all k operands are popped from the stack, and the result is pushed onto the stack (or there could be a side-effect)

Example Stmt -> ID ‘=‘ expr { stmt.t = expr.t || ‘istore a’} Applied to a = 3*b –c bipush 3 iload b imul iload c isub istore a

Java Virtual Machine • Analogous to the abstract stack machine, the Java Virtual machine is an abstract processor architecture that defines the behavior of Java Bytecode programs. • The stack (in JVM) is referred to as the operand stack or value stack. Operands are fetched from the stack and the result is pushed back on to the stack. • Advantages: VM code is compact as the operands need not be explicitly named.

Data Types • The int data type can hold 32 bit signed integers in the range -2^31 to 2^(31) -1. • The long data type can hold 64 bit signed integers. • Integer instructions in the Java VM are also used to operate on Boolean values. • Other data types that Java VM supports are byte, short, float, double. (Your project should handle at least three data types).

Selected Java VM Instructions Java VM instructions are typed i.e., the operator explicitly specifies what operand types it expects. Expression Evaluation • sipush n push a 2 byte signed int on to stack • iload v load/push a local variable v • istore v store top of stack onto local var v • iadd pop two elements and push their sum • isub pop two elements and push their difference

Selected Java VM Instructions • imul pop two elements and push their product • iand pop two elements and push their bitwise and • ior pop two elements and push their bitwise or • ineg pop top element and push its negation • lcmp pop two elements (64 bit integers), push the comparison result. 1 if Vs[0]<vs[1], 0 if vs[0]=vs[1] otherwise -1. • i2l convert integers to long • l2i convert long to integer

Selected Java VM Instructions • Branches: • GOTO L unconditional transfer to label l • ifeq L transfer to label L if top of stack is 0 • ifne L transfer to label L if top of stack !=0 • Call/Return: Each method/procedure has memory space allocated to hold local variables (vars register), an operand stack (optop register) and an execution environment (frame register)

Selected Java VM Instructions • Invokestatic p invoke method p. pop args from stack as initial values of formal parameters (actual parameters are pushed before calling). • Return return from current procedure • ireturn return from current procedure with integer value on top of stack. • Areturn return from current procedure with object reference return value on top of stack.

Selected Java VM Instructions Array Manipulation: Java VM has an object data type reference to arrays and objects • newarray int create a new array of integers using the top of the stack as the size. Pop the stack and push a reference to the newly created array. • Iaload pop array subscript expression on top of stack and array pointer (next stack element). Push value contained in this array element. • iastore

Selected Java VM Instructions Object Manipulation • new c create a new instance of class C (using heap) and push the reference onto stack. • getfield f push value from object field f of object pointed by object reference at the top of stack. • putfield f store value from vs[1] into field f of object pointed by the object reference vs[0]

Selected Java VM Instructions Simplifying Instructions: • ldc constant is a macro which will generate either bipush or sipush depending on c.

Byte Code (JVM Instructions) • No-arg operand: (instructions needing no arguments hence take only one byte.) • examples: aaload, aastore,aconsta_null, aload_0, aload_1, areturn, arraylength, astore_0, athrow, baload, iaload, imul etc • One-arg operand: bipush, sipush,ldc etc methodref op: invokestatic, invokenonvirtual, invokevirtual

Byte Code (JVM Instructions) • Fieldref_arg_op: • getfield, getstaic, putfield, pustatic. • Class_arg_op: • checkcast, instanceof, new • labelarg_op (instructions that use labels) • goto, ifeq, ifne, jsr, jsr_w etc • Localvar_arg_op: • iload, fload, aload, istore

Translating an If statement Stmt -> if expr then stmt1 { out = newlabel(); stmt.t = expr.t|| ‘ifnnonnull’ || out || stmt1.t || ‘label’ out: ‘nop’ } example: if ( a +90==7) { x = x+1; x = x+3;}

Translating a while statement Stmt -> WHILE (expr) stmt1 { in =newlabel(); out= newlabel(); emit( stmt.t = ‘label’ || in|| ‘nop’ || expr.t || ‘ifnonnull’|| out|| stmt1.t || ‘goto’ || in|| ‘label’ || out) }

References • Compilers Principles, Techniques and Tools, Aho, Sethi, and Ullman , Chapter 5 • http://www.cs.rpi.edu/~moorthy/Courses/compiler98/Lectures/lecturesinppt/lecture15.ppt • http://www.cs.wisc.edu/~bodik/cs536/NOTES/lecture14.ppt • http://www.ece.utexas.edu/~doron/yacc.ppt • Appel, Chapter 4 and 5

Syntax Directed Translation