4. Processing the intermediate code

4. Processing the intermediate code From: Chapter 4, Modern Compiler Design, by Dick Grunt et al.

4.0 Background • The AST still bears very much the traces of the source language and the the programming paradigm it belongs to: • higher-level constructs are still represented by nodes and subtrees. • The next step in processing the AST: • transformation to intermediate code • IC generation • serves to reduce the set of the specific node types to a small set of general concepts that can be implemented easily on actual machines. • IC generation • finds the language-characteristic nodes and subtrees in the AST and rewrites them into subtrees that employ only a small number of features, each of which corresponds rather closely to a set of machine instructions. • The resulting tree should probably be called an intermediate code tree. Intermediate code

4.0 Background • The standard IC tree features • expressions, including assignments, routine calls, procedure headings, and return statements, and conditional and unconditional jumps. • Administrative features • memory allocation for global variables, • activation record allocation, and • module linkage information. • IC generation • increases the size of the AST, bit • reduces the conceptual complexity Intermediate code

Deferred to Chs.6 through 9 This chapter Intermediate code

4.0 Background • Roadmap 4. Processing the intermediate code 4.1 Interpretation 4.2 Code generation 4.3 Assemblers, linkers, and loaders • A sobering thought • whatever the processing method, writing the run-time system and library routines used by the programs will be a substantial part of the work. • Little advice can be given on this; most of it is just coding, and usually there is much of it. Intermediate code

4.1 Interpretation • The simplest way to have the actions expressed by the source program performed is to • process the AST using an ‘interpreter’. • An interpreter is • a program that considers the nodes of the AST in the correct order and performs the actions prescribed for those nodes by the semantics of the language. • Two varieties of interpreter • Recursive: works directly on the AST and requires less preprocessing • Iterative: works on a linearized version of the AST but requires more preprocessing. Intermediate code

4.1.1 Recursive interpretation • A recursive interpreter has an interpreting routine for each node type in the AST. • Such an interpreting routine calls other similar routines, depending on its children; • it essentially does what it says in the language definition manual. • This architecture is possible because the meaning of a given language construct is defined as a function of the meanings of its components. • For example, if-statement = condition + then part + else part Intermediate code

Intermediate code

4.1.1 Recursive interpretation • An important ingredient in a recursive interpreter is the uniform self-identifying data representation. • The interpreter has to manipulate data values defined in the program being interpreted, but the types and sizes of these values are not known at the time the interpreter is written. • This makes it necessary to implement these values in the interpreter as variable-size records that specify the type of the run-time value, its size, and the run-time value itself. Intermediate code

4.1.1 Recursive interpretation • Another important feature is the status indicator. • It is used to direct the flow of control. • Its primary component: the mode of operation of the interpreter. • An enumeration value, like Normal mode, indicating sequential flow of control, but • other values are available, to indicate jumps, exceptions, function returns, etc. • Its second component: a value to supply information about non-sequential flow of control. • Return mode, Exception mode, Jump mode Intermediate code

4.1.1 Recursive interpretation • Each interpreting routine checks the status indicator after each call to another routine, to see how to carry on. • If Normal mode, the routine carries on normally. • Otherwise, it checks to see if the mode is one it should handle; • If it is, it does so, but • If it is not, the routine returns immediately, to let one of the parent routines handle the mode. PROCEDURE Elaborate return with expression statement (Rwe node): SET Result TO Evaluate expression (Rwe node .expression); IF Status .mode /= Normal mode: RETURN; SET Status .mode TO Return mode; SET Status .value TO Result; Intermediate code

4.1.1 Recursive interpretation • Variables, named constants, and other named entities are handled by entering them into the symbol table, in the way they are described in the manual. • It is useful to attach additional data to the entry. • E.g., if in the manual the entry for ‘declaration of a variable V of type T’ states that room should be allocated for it on the stack, • we allocate the required room on the heap and enter into the symbol table under the name V a record with the following fields: • A pointer to the name V, • The file name and line number of its declaration, • An indication of the kind of declarable (variable, constant, field selector, etc.), • A pointer to the type T, • A pointer to newly allocated room for the value of V, • A bit telling whether or not V has been initialized, if known, • One or more scope- and stack-related pointers, depending on the language, • Perhaps other data, depending on the language. Intermediate code

4.1.1 Recursive interpretation • A recursive interpreter can be written relatively quickly, and is useful for rapid prototyping; • It is not the architecture of choice for heavy-duty interpreter. • A secondary advantage: it can help the language designer to debug the design of the language and its description. • Disadvantages: • Speed of execution • May be a factor of 1000 or more lower than what could be achieved with a compiler • Can be improved by doing judicious memorization. • Lack of static context checking • If needed, full static context checking can be achieved by doing attribute evaluation before stating the interpretation. Intermediate code

4.1.2 Iterative interpretation • The structure of an iterative interpreter consists of • a flat loop over a case statement which contains a code segment for each node type; • the code segment of a given node type implements the semantics of the node type, as described in the language definition manual. • It requires • A fully annotated and threaded AST, and • Maintains an active-node pointer, which points to the node to be interpreted, the active node. • It repeatedly runs the code segment for the node pointed at by the active-node pointer; • This code sets the active-node pointer to another node, its successor, thus leading the interpreter to that node. Intermediate code

4.1.2 Iterative interpretation • The iterative interpreter possesses much more information about run-time events inside a program than a compiled program does, but less than a recursive interpreter. • A recursive interpreter can maintain an arbitrary information for a variable by storing it in the symbol table, whereas iterative interpreter only has a value at a give address. • Remedy: a shadow memory parallel to the memory array maintained by the interpreter. • Each byte in the shadow memory has 256 possibilities, for example, ‘This byte is uninitialized’, ‘This byte is a non-first byte of a pointer,’ ‘This byte belongs to a read-only array”, ‘This byte is part of the routine call linkage’, etc. Intermediate code

4.1.2 Iterative interpretation • The shadow data can be used for interpreter-time checking, for example, • To detect the use of uninitialized memory, • Incorrectly aligned data access, • Overwritting read-only and system data, and etc. Intermediate code

4.1.2 Iterative interpretation • Some iterative interpreter can store the AST in a single array, because • Easier to write it to a file • A more compact representation • Historical and conceptual Intermediate code

4.1.2 Iterative interpretation • Iterative interpreters are usually somewhat easier to construct than recursive interpreters; • They are much faster but yield less run-time diagnostics. • Iterative interpreters are much easier to construct than compilers and • They yield far superior run-time diagnostics. • Much slower than compiler version • Between 100 and 1000 times slower, but after optimization interpreter reduced the loss perhaps to a factor of 30 or less. • Advantages: • Increased portability • Increased security, for example, in Java Intermediate code

4.2 Code generation • Compilation produces object code from the intermediate code tree through a process: code generation. • Basic concept • The systematical replacement of nodes and subtrees of the AST by target code segment, in a way that the semantics is preserved • A linearization phase, producing a linear sequence of instructions from the rewritten AST • The replacement process is called tree rewriting • The linearization is controlled by the data-flow and flow-of-control requirements of the target code segments. Intermediate code

Ra + Ra * 9 + * 9 2 Rt 2 mem Load_Byte (b+Rd)[Rc],4,Rt + + @b Ra * Rd Load_Address 9[Rt],2,Ra 4 Rc Load_Byte (b+Rd)[Rc],4,Rt Intermediate code

4.2 Code generation • Three main issues in code generation • Code selection • Which part of the AST will be rewritten with which template, using which substitutions for instruction parameters? • Register allocation • What computational results are kept in registers? Note that it is not certain that there will be enough registers for all values used and results obtained. • Instruction ordering • Which part of the code is produced first and which later? Intermediate code

4.2 Code generation • Optimal code generation is NP-complete • Compromising by restricting the problem • Consider only small parts of the AST at a time; • Assume that the target machine is simpler that it actually is, by disregarding some of its complicated features; • Limit the possibilities in the three issues by having conventions for their use. Intermediate code

4.2 Code generation • Preprocessing: AST node patterns are replaced by other (better) AST node patterns • Code generation proper: AST node patterns are replaced by target code sequences, and • Postprocessing: target code sequences are replaced by other (better) target code sequences, using peephole optimization Intermediate code

4.2.1 Avoiding code generation altogether AST of source program P Interpreter an executable program, like a compiled program A good way to do rapid prototyping, if the interpreter is available Intermediate code

4.2.2 The starting point • Classes of the nodes in an intermediate code tree • Administration • For example, declarations, module structure indications, etc. • Code needed is minimal and almost trivial. • Flow-of-control • For example, if-then, multi-way choice from case statements, computed gotos, function calls, exception handling, method application, Prolog rule selection, RPC, etc. • Expressions • Many of the nodes to be generated belongs to expressions. • Techniques for code generation • Trivial • Simple, and • Advanced Intermediate code

4.2.3 Trivial code generation • There is a strong relationship between iterative interpretation (II) and code generation (CG): • An II contains code segments performing the actions required by the nodes in the AST; • A CG generates code segments performing the actions required by the node in the AST • Active node pointer is replaced by machine instruction pointer Intermediate code

4.2.3 Trivial code generation • At first sight it may seem pointless to compile an expression in C to code in C, and the code obtained is inefficient, but still several points have been made: • Compilation has taken in a real sense • The code generator was obtained with minimal effort • The process can be repeated for much more complicated source languages • Two improvements • the threaded code • partial evaluation Intermediate code

4.2.3.1 Threaded code • The code of Fig. 4.13 is very repetitive, and the idea is to pack the code segment into routines, possibly with parameters. • called threaded code Intermediate code

4.2.3.1 Threaded code • The advantage of threaded code is that it is small. • It is mainly used in process control and embedded systems, to control hardware with limited processing power, for example palmtop and telephone. • If the ultimate in code size reduction is desired, the routines can be numbered and the list of calls can be replaced by an array of routine numbers. Intermediate code

4.2.3.2 Partial evaluation • The process of performing part of a computation while generating code for the rest of the computation is called partial evaluation. • It is a very general and powerful technique for program simplification and optimization. • Many researchers believe that • many of the existing optimization techniques are special cases of partial evaluation • and that better knowledge of it would allow us to obtain very powerful optimizers, • thus simplifying compilation, program generation, and even program design. Intermediate code

4.2.4 Simple code generation • Two machine types are considered: • Pure stack machine and pure register machine • A pure stack machine • uses a stack to store and manipulate values; • it has no registers. • It has two types of instructions • those that move or copy values between the top of the stack and elsewhere and • those that do operations on the top element or elements of the stack. • Two important data administration pointer • the stack pointer, SP, and • the base pointer, BP. Intermediate code

4.2.4 Simple code generation • The code for p:=p+5is Push_Local #p //Push value of #p-th local onto stack Push_Const 5 //Push value 5 onto stack Add_Top2 //Add top two elements Store_Local #p //Pop and store result back in #p-th local. Intermediate code

4.2.4 Simple code generation • A pure register machine has • a memory to store values in, • a set of registers to perform operations on, and • two set of instructions. • One set contains instructions to copy values between the memory and a register. • The other perform operations on the values in two registers and leave the result in one of them. Intermediate code

4.2.4 Simple code generation • The code for p:=p+5 on a register-memory machine would be: Load_Mem p,R1 Load_Const 5,R2 Add_Reg R2,R1 Store_Reg R1,p Intermediate code

4.2.4.1 Simple code generation for a stack machine Intermediate code

4.2.4.1 Simple code generation for a stack machine Push_Local #b Push_Local #b Mult_Top2 Push_Const 4 Push_Local #a Push_Local #c Mult_Top2 Mult_Top2 Store_Top2 Intermediate code

4. Processing the intermediate code

4. Processing the intermediate code

Presentation Transcript

Intermediate Code Generation

Intermediate Code Generation

Lecture 8: Intermediate Code

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate code generation

Intermediate Code Generation

COMPILERS Intermediate Code

COMPILERS Intermediate Code

Chap. 4, Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Representations

Intermediate code generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation