What is a Compiler ? • A Compiler is a program that reads a program written in one language-the source language-and translates it into an equivalent program in another language-the target language. • As an important part of this translation process the compiler reports to its user the presence of errors in the source programs.
Compiling Process COMPILER SOURCE PROGRAM TARGET PROGRAM ERROR MESSAGES
Contents of the topic • Overview and history of compiler • Comparison with interpreter • Parts of a compiler • Phases of a compiler • Compilation of a sample program • Compiler construction tools • Brief description on lexical analysis phase
Overview and history of compiler • First compiler was produced by IBM in 1950’s for FORTRAN language. • It was taken 18 years to build the first compiler. • Today we can build a compiler in a very few months. • Designing an efficient and reliable compiler is still challenging.
Comparison with Interpreter output • Interpreter program data Compiler output binary data program
Comparison with interpreter • In this model , the data and the source program are input to the interpreter. Instead of producing any object module as in the compilation model , the interpreter produces the results by performing the operation on the source program on its data. • Interpreter is less efficient in execution than compiler. • But it handles certain language features which can not be compiled e.g. languages like APL are normally interpreted. • Interpreter can be portable as they don’t produce machine code. • An interpreter gives us an improved debugging environment because it can check for errors like out of bounds array indexing at run time.
Parts of a compiler • Analysis of a source program • Synthesis of a source program
Phases of a compiler Source program Lexical Analyzer tokens Syntax analyzer Symbol table Error handler Abstract syntax trees Semantic Analyzer Intermediate code generator Intermediate code Analysis phase of the compiler
Intermediate code Code optimizer Optimized code Symbol table Error handler Code generator Target program Synthesis phase of a compiler
Lexical Analysis Stream of characters Lexical analyzer tokens • It groups characters into tokens • Eliminates comments and spaces. • Process compiler directives. • Enter information into symbol table. • Scanner may be hand coded or may be generated from regular expression. X = y * z + 10 Ex. Lexical analyzer Id1 assign-op id2 mult-op id3 add-op num
Syntax Analysis Hierarchical analysis is called parsing or syntax analysis Combines tokens to grammatical phrases. The grammatical phrase of the source program are represented by a parse tree. tokens Syntax tree Syntax analyzer
Example: Id1 assign-op id2 mult-op id3 add-op num Syntax Analyzer Assign-op id1 add-op Mult-op num id2 id3
Semantic Analysis Annotated Parse tree Syntax tree Semantic Analyzer • Determines the meaning of the source string. • Gathers type information for the subsequent code generation phase. • It uses hierarchical structure determined by the syntax analysis phase to identify the operators and operands of expressions and statements. • It checks each operator has operand that are permitted by the source language specification.
Assign-op Example: id1 Add-op Mult-op num id2 id3 Semantic Analyzer assign-op id1 add-op Mult-op inttoreal num id2 id3
Generate an explicit intermediate representation of the source program. This intermediate representation should have two important properties. 1.easy to produce 2.easy to translate to source program. We can think the intermediate representation as a program for an abstract machine. Intermediate code generation Intermediate code Annotated Parsed tree Intermediate code generator
Example: assign-op id1 Add-op Mult-op inttoreal id2 id3 num Intermediate code generator temp1 : = id2 * id3 temp2 : = inttoreal(num) temp3 : = temp1 + temp2 Id1 : = temp3
Code optimization Intermediate code Optimized code Code optimizer • The optimizer tries to improve the intermediate code in order to achieve the faster running machine code. temp1 : = id2 * id3 temp2 : = inttoreal(num) temp3 : = temp1 + temp2 Id1 : = temp3 Code optimizer temp1 : = id2 * id3 Id1 : = temp1 + rnum
Code generation Optimized code Code generator Target program • Generate the target code for the optimized code. • The storage must be allocated or the register must be assigned to the variable. • Addressing modes to be used for accessing the data must be decided before generating the code.
Example: temp1 : = id2 * id3 Id1 : = temp1 + rnum Code generator Movf id3 , r2 Mulf id2 , r2 Movf rnum , r1 Addf r2 , r1 Movf r1, id1
Compilation of a sample program X = y * z + 10 Lexical analysis Id1 assign-op id2 mult-op id3 add-op num Syntax analysis Assign-op id1 Add-op Mult-op num id2 id3
Semantic analysis Assign-op id1 Add-op Mult-op inttoreal id2 id3 num Intermediate code generation
temp1 : = id2 * id3 temp2 : = inttoreal(num) temp3 : = temp1 + temp2 Id1 : = temp3 Code optimization temp1 : = id2 * id3 Id1 : = temp1 + rnum Code generation Movf id3 , r2 Mulf id2 , r2 Movf rnum , r1 Addf r2 , r1 Movf r1, id1
Symbol table Management • A symbol is a data structure containing a record for each identifier with fields for the attributes of the identifiers. • The data structure allows us to find the record for each identifier quickly and to store or retrieve data from that record quickly. • When an identifier in the source program is detected by the lexical analyzer. The identifier is entered into the symbol table.
Error handler • Each phase can encounter errors. • After detecting an error a phase must some how deal with that error, so that compilation can proceed, allowing further errors in the source program to be detected. • The syntax and semantic analysis phase usually handle a large fraction of the error detectable by the compiler. • Errors in the structure of a token is determined by the syntax analysis phase.
Compiler construction tools • Parser generator : - It produces syntax analyzers. These parser generators take inputs based on context free grammar. • Yacc is a LALR parser generator and is available as a command on UNIX. • Lexical-analyzer generators produce lexical analyzers. These scanner generators take specification based on regular expressions. • For example Lex is a lexical analyzer generator. This tool is available on UNIX. The basic organization of the resulting lexical analyzer is a finite automation.
Compiler construction tools • Syntax directed translation engines produce collections of routines that traverse the parse tree which generate intermediate code. • Automatic code generators take a collection of rules defining the translation of each operation of the intermediate language to the target machine language.