760 likes | 865 Views
Intermediate Code Generation. Chapter 6. Bac k -end and Fron t -end of A Com piler. Bac k -end and Fron t -end of A Com piler. Cl ose t o sou r c e lan g uage (e . g. s y n t a x t r e e ). Cl ose t o machine l an g uage.
E N D
Intermediate Code Generation Chapter 6
Back-endandFront-end ofACompiler
Back-endandFront-end ofACompiler Closetosource language (e.g.syntaxtree) Closetomachinelanguage mxncompilerscanbebuiltby writingjust mfrontends andnback ends
Back-endandFront-end ofACompiler • Includes: • Typechecking • Any syntactic checksthat remain afterparsing • (e.g. ensurebreak statementisenclosedwithin while-,for-,orswitchstatements).
Component-BasedApproachtoBuildingCompilers Sourceprogram inLanguage-1 Sourceprogram inLanguage-2 Non-optimizedIntermediateCode OptimizedIntermediate Code Target-2 machinecode Target-1 machinecode
IntermediateRepresentation(IR) Akind ofabstractmachinelanguagethat canexpress thetargetmachine operations withoutcommitting totoo much machine details. :MWhyIR?
WithoutIR C Pascal FORTRAN C++ SPARC HPPA x86 IBM PPC
WithIR C SPARC Pascal HPPA IR FORTRAN x86 IBM PPC C++
WithIR C ? Pascal CommonBackend IR FORTRAN C++
Advantagesof UsinganIntermediateLanguage Retargeting-Builda compilerforanewmachine by attaching anew codegeneratorto an existing front-end. Optimization-reuse intermediate code optimizers in compilers fordifferentlanguages and different machines. Note:the terms “intermediate code”,“intermediate language”, and “intermediate representation”are all used interchangeably.
IssuesinDesigninganIR • Whether touseanexisting IR • if targetmachine architecture is similar • if the new language is similar • Whether theIRis appropriate for the kind ofoptimizations tobeperformed • e.g. speculationandpredication • sometransformations maytake muchlongerthanthey wouldona different IR
IssuesinDesigninganIR • Designing anewIRneeds to consider • Level(how machine dependent it is) • Structure • Expressiveness • Appropriateness for general andspecial optimizations • Appropriateness forcodegeneration • Whethermultiple IRs should be used
Multiple-Level IR Target code High-level IR Low-level IR SourceProgram … Semantic Check High-level Optimization Low-level Optimization
UsingMultiple-levelIR • Translating from onelevelto another inthe compilationprocess • Preserving anexisting technology investment • Some representationsmaybe more appropriate foraparticular task.
Commonly UsedIR • :MPossibleIR forms • Graphical representations:such as syntaxtrees,AST (Abstract SyntaxTrees),DAG • Postfixnotation • Threeaddresscode • SSA(StaticSingle Assignment) form • :MIR should haveindividual components thatdescribe simplethings
SyntaxTree + + * a * - b c d a - b c
SyntaxTree + + * a * - b c d a - b c
SyntaxTree + + * a * - b c d a - b c Nodecan have morethan one parent • DirectedAcyclicGraph (DAG): • Morecompactrepresentation • Givescluesregardinggenerationofefficient code
Example Construct the DAG for:
HowtoGenerateDAGfrom Syntax-DirectedDefinition? Allwhat is neededis that functionssuchasNode andLeaf abovecheck whethera nodealready exists.Ifsucha nodeexists, a pointer isreturnedtothatnode.
HowtoGenerateDAGfrom Syntax-DirectedDefinition?
DataStructure:Array Leaves Leftandrightchildren ofanintermediatenode Operationcode Scanningthearray eachtimea newnodeis needed,is not anefficientthingtodo.
DataStructure:HashTable Hashfunction=h(op,L,R)
Three-AddressCode • Anotheroptionforintermediatepresentation • Builtfromtwoconcepts: • –addressesandinstructions • Atmostoneoperator
Address • Canbeone ofthe following: • Aname: source program name • Aconstant • Compiler-generatedtemporary
Instructions Procedure callsuch as p(x1, x2,…,xn)is implementedas:
Example OR
Choiceof OperatorSet • Rich enoughto implementthe • operationsofthe sourcelanguage • Closeenough to machine instructionsto • simplifycodegeneration
DataStructure • Howto present these instructionsin a • datastructure? • Quadruples • Triples • Indirecttriples
DataStructure: Quadruples • Hasfourfields:op, arg1, arg2,result • Exceptions: • Unaryoperators:noarg2 • Operatorslikeparam:noarg2,noresult • (Un)conditionaljumps: targetlabelis the • result
op arg1 arg2 result tl= tz= b *tt t3= minusct4= b *t3 t6= tz+ t4 a = t5 minusc 0 1 2 3 4 5 = I tsI I
DataStructure: Triples • Onlythree fields:noresultfield • Results referredtoby itsposition
DataStructure: IndirectTriples • Wheninstructions are movingaround duringoptimizations: quadruples are betterthantriples. • Indirecttriplessolvethisproblem Optimizingcompliercanreorderinstruction list, insteadofaffectingthetriplesthemselves List ofpointerstotriples
Single-Static-Assignment(SSA) • Isan intermediatepresentation • Facilitatescertaincode optimizations • Allassignmentsare tovariableswith distinct names
Single-Static-Assignment(SSA) Example: Ifwe usedifferent names for Xintruepartand false part,thenwhichname shall weuseinthe assignmentof y = x* a? Theanswer is:Ø-function Returnsthevalueofits argument that correspondstothe control-flow paththat was takentoget totheassignment statementcontainingtheØ-function
TypesandDeclarations • Typechecking:to ensurethat typesof operands matchthe type expectedby operator • Determinethe storage needed • Calculate the address ofanarray reference • Insertexplicittypeconversion • Choose the writeversionofanoperator • …
StorageLayout • From thetype,we candetermine • amount of storage at runtime. • Atcompiletime, we will use thisamount • to assign its namea relativeaddress. • Typeandrelativeaddressare saved in • thesymboltable entry of thename. • Datawith lengthdeterminedonly at run timesavesa pointer inthe symboltable.
StorageLayout • Multibyteobjectsarestoredin consecutivebytesandgiven the address of thefirstbyte • Storageforaggregates(e.g. arrays and classes)isallocatedinonecontiguous block ofbytes.
Tokeeptrack ofthenext availablerelative address Create asymbol tableentry
TranslationsofStatements andExpressions Syntax-DirectedDefinition (SDD) Syntax-DirectedTranslation (SDT)
Addressholdingvalue of E (e.g.tmp variable,name,constant) Three-address codeofE Buildan instruction Getatemporary variable Current symbol table
a=b+-c • t1 = minusc t2;b+tl a = t2
Generating three-address codeincrementally toavoidlongstrings manipulations • gen() does twothings: • generate threeaddressinstruction • •appenditto the sequenceof • instructionsgeneratedsofar
Arrays • Elementsof thesametime • Storedconsecutively inmemory • Inlanguageslike C or Java elements are:0, 1, …, n-1 • Insomeotherlanguages: • low, low+1, …, high