1 / 35

Overview

Overview. Functionality of LANCE Software structure C frontend Intermediate representation (IR) IR optimizations Control and data flow analysis Backend interface. The LANCE V2.0 compiler system. Purpose of LANCE: Facilitate C compiler development for new target processors

gurit
Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview • Functionality of LANCE • Software structure • C frontend • Intermediate representation (IR) • IR optimizations • Control and data flow analysis • Backend interface

  2. The LANCE V2.0 compiler system • Purpose of LANCE: • Facilitate C compiler development for new target processors • Give insight into compiler structure • Tasks covered by LANCE: • Source code analysis • Generation of IR • Machine-independent optimizations • Data flow graph generation • Tasks not covered by LANCE: • Assembly code generation (backend) • Machine-specific optimizations • Code assembly and linking

  3. Key features • Full ANSI C coverage (C 89) • Modular tool and library structure • Simple three address code IR (C subset) • Plug & play IR optimizations • Backend interface compatible to OLIVE • Proven in numerous compiler projects

  4. header file lance2.h C++ library liblance2.a LANCE software structure LANCE library LANCE tools C frontend IR optimization 1 common IR used by IR optimization n machine- specific backend

  5. ANSI C frontend • Functionality: • Lexical, syntactical, and semantical analysis of C source • Generation of three address code IR for a C file • Emission of error messages if required (gcc style) • Machine-specific constants (type bitwidth, alignment) stored in a configuration file • Implementation: • Based on a context-free C grammar, according to K&R spec • C source automatically generated with attribute grammar compiling system (OX, extension of lex & yacc) • In total approx. 26,000 lines of C source code • Validated with comprehensive test suite

  6. config.sparc Setup and IR generation file test.ir.c • Environment variables: • setenv LANCE2_CPP „gcc –E“ • setenv LANCE2_CONFIG „config.sparc“ Call C frontend by „compile“ command: file test.c >compile test.c

  7. General IR format • One IR file (*.ir.c) generated for each C source file (*.c) • External IR format: C subset (compilable !) • Internal IR format: Accessible via LANCE library • IR contains a symbol table + three address code (3AC) for each C function defined in the source code • 3AC is a sequence of IR statements • 3AC = at most two operands, one result per statement • IR statements (mostly) consist of IR expressions • blocks of 3AC augmented with source information (C code, source line no.) for debugging purposes

  8. Classes of IR statements • Assignment: a = b + c; *p = !a; x = f(y,z); cond = *x; • Jump: goto lab; • Conditional jump: if (cond) goto lab; • Label: lab: • Return void: return; • Return value: return x;

  9. Classes of IR expressions • Symbol: „a“, „b“, „main“, „count“, ... • Binary expression: a * b, x / 2, 3 ^ v, f &4, q % r, ... • Unary expression: !a, *p, ~x, -z, ... • Function call: f1(), f2(a,b), f3(*x, 1, y), ... • Type cast: (char)z, (int)a, (float*)b, ... • String constant: „compiler“, „design“, „is“, „fun“, ... • Integer constant: 1000, 3456, -234, -112, ... • Float constant: „3.1415926536“, „2.718281828459“, ...

  10. Why is the LANCE IR a C subset ? Validation of frontend (or any IR optimization): frontend IR-C source C source CC exe 1 test input exe 2 CC = ? output 1 output 2 C-to-C optimization: optimized C source IR optimization tools CC

  11. IR data structure overview function list IR statement list fun 1 „name1“ stm 2 stm 1 stm m .......... Class: cond. jump ID: 4124 Target: „L1“ Condition: c Class: assignment ID: 4123 Left hand side: *p Right hand side: a + b ... stm info fun n „name n“ IR expression Class: binary ID: 10034 Left arg: a Right arg: b Oper: + Type: int Local symbol table int a,b,c; ... GLOBAL SYMBOL TABLE int x1,x2,x3; double y1,y2,y3; ........ exp info

  12. The IR type class • C++ class IRType stores type info for all symbols and expressions • Primary type: void, char, short, int, array, pointer, struct, function, ... • Secondary type: subtype of arrays and pointers • Storage class: extern, static, register, ... • Qualifiers: const, volatile • Example: const int* A[100]; Type->Class() = IRTYPE_ARRAY // primary type Type->IsConst() = true Type->Subtype()->Class() = IRTYPE_POINTER Type->Subtype()->Subtype()->Class() = IRTYPE_INT Type->ArrayDim() = 100 Type->SizeOf() = 400 // in bytes, for 32-bit pointers Type->MemoryWords() = 200 // for a 16-bit word memory

  13. The symbol table class • Symbol table stores all relevant information for symbols/identifiers • Two hierarchy levels: • Global symbol table IR->GlobalSymbolTable() • One local symbol table per function fun->LocalSymbolTable() • All local symbols get a unique numerical suffix, e.g. int f(int x) { int a,b; } int f(int x_1) { int a_2, b_3; } • Important access methods: • ST->LookupSymbol(char* name) • IRSymbol* ST->CreateSymbol(IRType* tp) • Iterators: ST->FirstObject(), ST->NextObject() • Information stored in a table entry (class IRSymbol): • Symbol type: IRType* sym->Type() • Symbol name: char* sym->Name()

  14. IR generation example forward declaration automatic conversion suffix 3 for parameter i auxiliary vars debug info source file IR file

  15. IR optimization tools • Purpose: perform machine-independent optimizations on IR • Identical IR format for all tools, „plug & play“ concept • Currently available tools: • Constant folding cfold tool • Constant propagation constprop tool • Copy propagation copyprop tool • Common subexpression elimination cse tool • Dead code elimination dce tool • Jump optimization jmpopt tool • Loop invariant code motion licm tool • Induction variable elimination ive tool • Automatic iteration of IR optimizations via „iropt“ shell script

  16. IR optimization example C source code compile unoptimized IR

  17. Constant folding cfold

  18. Constant propagation constprop

  19. Copy propagation copyprop

  20. Common subexpression elimination cse

  21. Dead code elimination dce

  22. Jump optimization jmpopt

  23. Loop invariant code motion licm

  24. Induction variable elimination ive

  25. Control flow analysis • Purpose: identify basic block structure of a C function • Basic block (BB): IR statement sequence with unique entry and exit points • Control flow graph (CFG): One node per BB, edge (BB1, BB2) iff BB2 may be an immediate successor of BB1 during execution • Assembly code generation usually done BB after BB • Example: BB1 while (x) { BB1; if (x) then BB2; else BB3; BB4; } BB2 BB3 BB4

  26. CFG generation by LANCE • Class ControlFlowGraph contained in LANCE library • Constructor ControlFlowGraph(Function* fun) generates CFG for any function fun • LANCE tool showcfg exports CFGs in the VCG text format • VCG can be used to visualize generated CFGs xvcg showcfg IR file CFG VCG file

  27. CFG visualization example showcfg + VCG tool

  28. Data flow analysis • Goal: convert IR into data flow graph (DFG) representation for assembly code generation by tree pattern matching • Performed by def/use analysis between IR statements/expressions • LANCE lib class DataFlowAnalysis provides required methods • Constructor DataFlowAnalysis(Function* fun) constructs data flow information for any function fun • Example: x = 5; goto lab; ... x = 6; lab: y = x + 1; ... z = 1 – y; u = y / 5; x has two definitions: x and x y has two uses: y and y

  29. DFG visualization example showdfg + VCG tool

  30. Backend interface • LANCE lib classes LANCEDataFlowTree and DFTManager provide link between LANCE IR and tree pattern matching • OLIVE/IBURG accept only trees instead of general DFGs • Hence: split DFGs at the common subexpressions (CSEs) a b CSE a b auxiliary variable * c 2 * t + + c t t 2 x y + + x y

  31. Data structure overview • Constructor DFTManager(Function* fun) generates data flow tree (DFT) representation for an entire function fun • DFTManager contains internal list of basic blocks • Each BB in turn is a list of DFTs DFT 2 DFT 1 DFT m .......... BB 1 BB 2 ... BB n

  32. DFT covering with OLIVE • DFTs are directly in the format required by code generators produced by OLIVE • All DFTs consist of a fixed set of terminal symbols (e.g. cs_STORE) (specified in file INCL/termlist.c) • Example (only a single DFT): C file DFT representation IR file

  33. Example (cont.) DFT in OLIVE format assembly code for hypothetical machine simplified OLIVE spec

  34. Summary • LANCE provides you with ... • C frontend • IR optimizations • C++ library for IR access (+ important basic classes) • interface to OLIVE data flow trees Full C compiler additionally requires ... • OLIVE based backend for the concrete target machine • target-specific optimizations (e.g. scheduling, address gen.)

More Related