Download Presentation
## Compiler

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Compiler**Chapter# 5 Intermediate code generation**Intermediate Code Generation**• In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back-end generates target code. • Intermediate codes are machine independent codes, but they are close to machine instructions. • The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator. • Intermediate language can be many different languages, and the designer of the compiler decides this intermediate language.**Intermediate Code Generation (continue……..)**• Syntax trees can be used as an intermediate language. • Postfix notation can be used as an intermediate language. • Three-address code (Quadruples) can be used as an intermediate language • We will use quadruples to discuss intermediate code generation • Quadruples are close to machine instructions, but they are not actual machine instructions. • some programming languages have well defined intermediate languages, for example: Java: Java virtual machine (JVM) • In fact, there are byte-code to execute instructions in these intermediate languages.**Variants of syntax trees:**• Nodes in a syntax tree represent constructs in the source program. • The children of a node represent the meaningful components of a construct. • A directed acyclic graph (DAG) for an expression identifies the common sub-expressions (sub-expression that occur more than once) of the expression. • DAG’s can be constructed by using the same techniques that construct syntax tree.**5**Cyclic Graphs for Expressions a + a * ( b – c ) + ( b – c ) * d + + * * d a - b c**Directed acyclic graphs for expressions:**• Like the syntax tree for an expression, a DAG has leaves corresponding to atomic operands and interior codes corresponding to operators. • The difference is that a node N in a DAG has more than one parent if N represents a common sub-expression. • In a syntax tree, the tree for the common sub-expression would be replicated as many times as the sub-expression appears in the original expression. • Thus DAG not only represents expressions more succinctly, it gives the compiler important clues regarding the generation of efficient code to evaluate the expressions.**Directed acyclic graphs for expressions:**(continue….) • The leaf for a has two parents, because a appears twice in the expression. • The two components of the common sub-expression b-c are represented by one node, the node labeled -. • That node has two parents, representing its two uses in the sub-expression a*(b-c) and (b-c)*d.**Three-address code**• In three-address code, there is at most one operator on the right side of an instruction. • Thus a source language expression like x+y*z might be translated into the sequence of three-address instructions t1 = y * z t2 = x + t1 • Where t1 and t2 are compiler-generated temporary names.**Addresses and instructions:**• Three-address code is built from two concepts: address and instructions. • Three-address code can be implemented using records with fields for the addresses, records are called quadruples and triples. • An address can be one of the following: • A name: For convenience, we allow source-program names to appear as addresses in three-address code. • A constant: In practice, a compiler must deal with many different types of constants. • A compiler-generated temporary: It is useful, especially in optimizing compilers, to create a distinct name each time a temporary is needed.**Addresses and instructions:(continue……….)**• We now consider the common three-address instructions. • Assignment instructions of the form x = y op z where op is a binary arithmetic or logical operation, and x,y and z are addresses. • Assignment of the form x = op y, where op is a unary operation. • Copy instruction of the form x=y, where x is assigned the value of y.**Quadruples:**• The description of three-address instruction specifies the components of each type of instruction. • In a compiler, these instructions can be implemented as objects or as records with fields for the operator and the operands. • Three such representations are called “quadruples”, “triples” and “indirect triples.” • A quadruple has four fields, which we call op, arg1, arg2, and result. • The op field contains an internal code for the operator. • For instance the three-address instruction x=y+z is represented by placing + in op, y in arg1, z in arg2, and x in result.**Quadruples: (continue…….)**• The following are some exceptions to this rule: • Instructions with unary operators like x= minus y or z=y do not use arg2. • Conditional and unconditional jumps put the target label in result.**Three address code for the assignmenta = b * -c + b * -c**• The special operator minus is used to distinguish the unary operator, as in –c, from the binary minus operator, as in b-c. • Note that the unary-minus “three-address” statement has only two addresses, as does the copy statement a=t5. • The quadruples in figure(b) implement the three-address code in (a) as follows:**Three address code for the assignmenta = b * -c + b ***-ccontinue……… t1 = minus c t2 = b * t1 t3 = minus c t4 = b * t3 t5 = t2 + t4 a = t5 (a) Three-address code (b) Quadruples**Triples:**• A triple has only three fields, which we call op, arg1, arg2. • Note that the result field in previous figure is used primarily for temporary names. • Using triples we refer the result of an operation x op y by its position, rather than by an explicit temporary name. • Thus, instead of the temporary t1 in Figure((b)previous slide), a triple representation would refer to position (0). • Parenthesized numbers represent pointers into the triple structure itself. • Positions or pointers to positions were called value numbers.**Representations ofa = b * -c + b * -c**= a + * * b minusb minus c c Syntax tree (b) Triples**Types and declarations:**• Types: • The applications of types can be grouped under checking and translation: • Type checking uses logical rules to reason about the behavior of a program at runtime. Specially, it ensures that the types of operands match the type expected by an operator. For example, the && operator in Java expects its two operands to be Boolean, the result is also type Boolean. • Translation: From the type of a name, a compiler can determine the storage that will be needed for that name at run time. Type information is also needed to calculate the address denoted by an array reference.**Translation of expressions:**• We begin in this section with the translation of expressions into three-address code. • An expression with more than one operator, like a+b*c, will translate into instructions with at most one operator per instruction. • An array reference A[i][j] will expend into a sequence of three-address instructions that calculate an address for the reference.**Addressing array elements:**• Array elements can be accessed quickly if they are stored in a block of consecutive locations. • In C and Java, array elements are numbered 0,1,2,……,n-1 for an array with n elements. • If the width of each array element is w, then the ith element of array ‘A’ begins in location base + i x w • Base is the relative address of A[0].**Type checking:**• To do type checking a compiler needs to assign a type expression to each component of the source program. • The compiler must then determine that these type expression conform to a collection of logical rules that is called the type system for the source program. • Type checking has the potential for catching errors in programs.**Type conversions:**• Consider expressions like x + i, where x is of type float and i is of type integer. • Since the representation of integers and floating-point numbers is different within a computer and different machine instructions are used for operations on integers and floats. • The compiler may need to convert one of the operands of + to ensure that both operands are of the same type when addition occurs. • Suppose that integers are converted to floats when necessary, using a unary operator (float). • For example, the integer 2 is converted to a float in the code for the expression 2 * 2.14 t1 = (float) 2 t2 = t1 * 3.14**Control flow:**• The translation of statements such as if-else-statements and while-statements is tied to the translation of Boolean expressions. • In programming languages, Boolean expressions are often used to • Alter the flow of control: Boolean expressions are used as conditional expressions in statements that alter the flow of control. • Compute logical values. A Boolean expression can represent true or false as values.**Boolean expressions:**• Boolean expressions are composed of the Boolean operators (which we denote &&, ||, and !, using the C convention for the operators AND, OR, NOT, respectively) applied to elements that are Boolean variables or relational expressions. • Relational expressions are of the form E1relE2, where E1 and E2 are arithmetic expressions. B B || B | B && B | !B | (B) | E rel E | true | false • We use the attribute rel.op to indicate which of the six comparison operators <, <=, ==, !=, >, >= is represented by rel.**Boolean expressions:(continue……)**• Given the expression B1 || B2, if we determine that B1 is true, then we can conclude that the entire expressions is true without having to evaluate B2. • Similarly, given B1 && B2, if B1 is false, then the entire expression is false.**Switch statements:**• The “switch” or “case” statement is available in a variety of languages. • Our switch statement syntax is given bellow: switch( E ) { case V1 : S1 case V2 : S2 ……….. case Vn-1 : Sn-1 default : Sn } • There is a selector expression E, switch is to be evaluated, followed by n constant values V1,V2,…Vn that the expression might take, perhaps including a default “value”, which always matches the expression if no other value does.**Translation of switch-statement**• The intended translation of a switch is code to: • Evaluate the expression E. • Find the value Vj in the list of cases that is the same as the expression. • Execute the statement Sj associated with the value found.