1 / 86

TMS470 Compiler Overview

TMS470 Compiler Overview. Tom Suchyta t-suchyta1@ti.com. Compiler Group Background. 30+ software engineers 280+ years of experience in the development optimizing compilers 6 Phd’s 1 TI Fellow, 3 Distinguished MTS, 3 Senior MTS Close contact with key academic compiler research institutions.

sally
Download Presentation

TMS470 Compiler Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TMS470 Compiler Overview Tom Suchyta t-suchyta1@ti.com

  2. Compiler Group Background • 30+ software engineers • 280+ years of experience in the development optimizing compilers • 6 Phd’s • 1 TI Fellow, 3 Distinguished MTS, 3 Senior MTS • Close contact with key academic compiler research institutions

  3. Compiler Group Background • World class software development toolset used on over 14 DSP and microcontroller architectures with installed base of over 30,000 toolsets • World class proprietary compiler technology, including state-of-the-art whole-program optimization • Focused on embedded systems applications • Real-time control • Telecom (terminal and infrastructure) • Automotive (safety critical application) • Intensive, comprehensive validation and benchmarking processes • Extensive experience in compiling for high performance architectures (VLIW, DSP, SIMD, mC, RISC, CISC)

  4. TI Compiler Portfolio • Support all major TI processor families • C54x, C55x, C6xxx, C2xxx, 470, 430 • Legacy product support model • Best-in-class optimization technology • Scalar optimization • VLIW optimization • SIMD (vectorizing) optimization • Conformance to industry standards • C/C++ language standards • Support standard tooling features, usability

  5. Compiler Use Outline • Using the Compiler • Optimization • Programming Hints • New Features • Linking

  6. Compiler Use Outline • Using the Compiler • Optimization • Programming Hints • New Features • Linking

  7. Compiler Executables The shell will pass options to each executable Shell cl470.exe Code Generator cg470.exe Assembler asm470.exe Linker lnk470.exe Optimizer opt470.exe Parser acp470.exe COFF2 executable: a.out Intermediate file of abstract syntax trees: file.if Assembly Code: file.asm COFF2 object file: file.obj Intermediate file of abstract syntax trees: file.opt

  8. Compiler Execution • Invoked through CCS • Can also be executed from a command line • Tools available for pc, sun, linux platforms • Can be invoked through Makefiles

  9. Data Types and Ranges

  10. Debug Options DWCU .dwtag DW_TAG_compile_unit .dwattr DWCU, DW_AT_name("file.c") DW$1 .dwtag DW_TAG_subprogram, DW_AT_name("func"), DW_AT_symbol_name("$func") .dwattr DW$1, DW_AT_low_pc($func) .dwattr DW$1, DW_AT_high_pc(0x00) .dwattr DW$1, DW_AT_frame_base[DW_OP_breg13 4] .dwpsn "file.c",2,1 $func: ADD SP, #-4 STR A1, [SP, #0] .dwpsn "file.c",3,5 ADD A1, A1, #1 DWT$11 .dwtag DW_TAG_base_type, DW_AT_name("int") .dwattr DWT$11, DW_AT_encoding(DW_ATE_signed) .dwattr DWT$11, DW_AT_byte_size(0x04) DWT$1018 .dwtag DW_TAG_subroutine_type, DW_AT_type(*DWT$11) .dwattr DWT$1018, DW_AT_language(DW_LANG_C) DW$3 .dwtag DW_TAG_formal_parameter, DW_AT_type(*DWT$11) .dwendtag DWT$1018 .dwattr DW$1, DW_AT_external(0x01) .dwattr DW$1, DW_AT_type(*DWT$11) .dwattr DWCU, DW_AT_language(DW_LANG_C) DW$4 .dwtag DW_TAG_assign_register, DW_AT_name("A1") .dwattr DW$4, DW_AT_location[DW_OP_reg0] • Nothing: > cl470 file.c • Some DWARF2 debug info, “skeletal DWARF”, is created: • global symbols • function call stack • line and column number • type information • register map • “Skeletal DWARF” has no impact on code generation • Debug info placed into COPY (no load) sections

  11. Debug Options • “-g” > cl470 –g file.c • Full DWARF2 debug info • Verbose debug info, complete program description • Required for C++ debug • “-gt” • “--symdebug:coff” (4.1.x) > cl470 –gt file.c • STABS debug info • Switch is deprecated

  12. Debug Options • “-gn” (deprecated) • “--symdebug:none” (4.1.x) > cl470 –gn file.c • No debug info is generated • “-mn” > cl470 –o2 –g –mn file.c • Debug with optimization can disable certain optimizations • This option prevents disabling • Code will be equivalent to optimized without debug

  13. File Options -fr=<dir> Object file directory -fs=<dir> Assembly file directory -ft=<dir> Temporary file directory -ff=<dir> Listing file directory -fp=<file> File is a C++ file -fc=<file> File is a C file -fo=<file> File is an object file -fa=<file> File is an assembly file -ea=<.ext> Assembly file extension -ec=<.ext> C file extension -ep=<.ext> C++ file extension -eo=<.ext> Object file extension -es=<.ext> Listing file extension -fg Treat files with C extensions as C++ cl470 –fr=c:\tmp –eo=<.o> -fg file.c file.c compiled as C++ and file.o placed in the c:\tmp directory

  14. Diagnostic Options extern char a; char func() { a = foo(); return a; } -pden Emit diagnostic id number -pdr Issue remarks -pdse=<n> Treat id as error -pdsr=<n> Treat id as remark -pdsw=<n> Treat id as warning -pdel=<n> Set error limit -pdw Suppress warnings -pds=<n> Suppress diagnostic <n> • Legal C, function without prototype defined by compiler as having “int” return type • Find remark id: • > cl470 –pdr –pden file.c • "file.c", line 5: remark #225-D: function declared implicitly • Treat id as error: • > cl470 –pdse=225 file.c • "file.c", line 5: error: function • declared implicitly 1 error • detected in the compilation of • "file.c". • >> Compilation failure

  15. Diagnostic Pragmas #pragma diag_error225 extern char a; char func() { a = foo(); return a; } • > cl470 file.c • "file.c", line 5: error: function • declared implicitly 1 error • detected in the compilation of • "file.c". • >> Compilation failure • New with the 4.1.x compiler: • #pragma diag_suppress id -pds=id Suppress diagnostic <id> • #pragma diag_remark id -pdsr=id Treat diagnostic <id> as a remark • #pragma diag_warning id -pdsw=id Treat diagnostic <id> as a warning • #pragma diag_error id -pdse=id Treat diagnostic <id> as an error • #pragma diag_default id N/A Use default severity of the diagnostic

  16. Parser Options • -pe Embedded C++ http://www.caravan.net/ec2plus/ • -ps Strict ANSI mode • -rtti C++ run time type information support • --static_template_instantiation All template instantiations given local linkage • --align_structs=<bytecount> Force alignment of structures to <bytecount> bytes. The bytecount must be a power of 2.

  17. Preprocessing Options • -ppc Preprocess with comments • -ppo Preprocess only • -ppa Preprocess and compile > cl470 –ppo file.c • Will create a file.pp

  18. Code Generation Options • -mt Generate 16-bit (thumb code • -md Do not generate dual-mode code. • -me Generate little endian code

  19. ARM Architecture Switches • Available with 4.1.x compilers • -mv4 • ARM7 (architecture version 4) • -mv5e • ARM9 (architecture version 5E) • -mv6 • ARM11 (architecture version 6)

  20. Misc. Shell Options • -version Print version of each executable only (no compilation) • --verbose Print version and function name • -n Do not assemble • -k Keep assembly file • -c Do not link • -D, -d Predefine symbol • -U, -u Undefine symbol • -I,-I Include search path • -s, -ss, Interlist src with assembly

  21. Linker Options • -z Tells shell to invoke linker. All following options are linker options. • -l <file> Include library or linker command file • --default_order Do not use size based output section allocation algorithm - Use if relying on old linker’s default placement of output sections • -b Turn off debug type merging - Can significantly improve link times • -c Run time global initialization • -cr Load time global initialization • -w Warn if unspecified output section is created • -x Reread libraries to resolve back references • --large_model Patch calls to far memory

  22. Pragmas • DATA_SECTION: place symbol in user defined section #pragma DATA_SECTION(symbol,”section name”); • CODE_STATE: change function compilation mode #pragma CODE_STATE(func,16 or 32); • DUAL_STATE: optimized veneers, will not be deleted #pragma DUAL_STATE(func);

  23. Pragmas • INTERRUPT: Mark a function as an interrupt routine, and set its type #pragma INTERRUPT(func, DABT | FIQ | IRQ | PABT | RESET | SWI | UDEF); • TASK: Mark a function that never returns #pragma TASK(func); • SWI_ALIAS: Refer to software interrupt as a function call • Function must have a prototype #pragma SWI_ALIAS(func,swi_number);

  24. Compiler Input Sections • Input sections created by the compiler • Placed in output sections by the linker as defined in the linker command file

  25. Register Conventions Other name Preserved by Call Register R0 A1 no R1 A2 no R2 A3 no R3 A4 no R4 V1 yes R5 V2 yes R6 V3 yes R7 V4, AP yes R8 V5 yes R9 V6 yes R10 V7 yes R11 V8 yes R12 V9, IP no R13 SP yes R14 LR no R15 PC n/a

  26. Make a call: Save On Call Regs R0 to R3, R12 Alias regs: A1 to A4, V9 First 4 arguments (or 2 long long args) are placed in R0 to R3 Rest of arguments are pushed on stack Not extended to 32-bit Structures passed by value are passed by reference If call returns a structure, allocate space for the return and pass address in R0 Callee function: Save On Entry Regs R4 to R11, R14 Alias regs: V1 to V8, LR Create local copy of structures passed by value A structure return value is copied to block pointed to by R0 Calling Convention Overview

  27. Dual-Mode Code v2.5x • Thumb mode prefix: ‘$’ • Arm mode prefix: ‘_’ #pragma CODE_STATE(foo, 16); #pragma CODE_STATE(func, 32); extern int a, b; int func(int); void foo() { a = func(b); } int func(int c) { return c + 1; } $foo: PUSH {LR} LDR A1, CON2 LDR A1, [A1, #0] BL $func LDR A2, CON1 STR A1, [A2, #0] POP {PC} .sect ".text" .clink .global _func ;************************ ;* FUNCTION NAME: func ;************************ _func: SUB SP, SP, #4 STR A1, [SP, #0] LDR V9, [SP, #0] ADD A1, V9, #1 ADD SP, SP, #4 BX LR .sect ".text:v$1" .clink .global $func .align .state16 ;************************ ;* FUNCTION VENEER: $func ;************************ $func: BX pc NOP .state32 B _func

  28. Dual Mode Code 4.1.x • By default, object files older than 4.1.x will not link • Use --abi=tiabi to use older dual mode scheme #pragma CODE_STATE(foo, 16); #pragma CODE_STATE(func, 32); extern int a, b; int func(int); void foo() { a = func(b); } int func(int c) { return c + 1; } .sect ".text" .align 4 .clink .thumbfunc _foo .state16 .global _foo ;********************* ;* FUNCTION NAME: foo ;********************* _foo: PUSH {LR} LDR A1, CON2 LDR A1, [A1, #0] BL _func LDR A2, CON1 STR A1, [A2, #0] POP {A3} BX A3 BLX _func • No unique prefix for arm or thumb routines • New assembler directives to pass routine state to linker: • .thumbfunc • .armfunc • Linker will replace call with BLX instruction if ARM9/ARM11 • Or, will replace with a call to a veneer created by the linker

  29. Binary Interface Switches • Available with 4.1.x compilers • --abi=tiabi • Compatible with 2.x version object files • Must use correct runtime library: • Example, if linking with rts16.lib, use rts16tiabi.lib • --abi=ti_arm9_abi • Default 4.1.x switch

  30. Function Subsections unsigned int udiv(unsigned int src1, unsigned int src2) … int div(int src1, int src2) … • Controlled by –ms switch • Each function placed in its own section • Each function will be conditionally linked • Typically used in libraries • Only referenced functions linked, not entire .obj file • Over use can negatively impact code-size • Linker switch –j will disable .sect ".text:$udiv" .clink .state16 .global $udiv $udiv: ... .sect ".text:$div" .clink .state16 .global $div $div: ...

  31. Far Memory Calls & Trampolines • Enable with --large_model switch • All calls are generated as near calls • The linker will "fix" each call site which is linked out-of-range of its callee destination • The “fix” is trampolines near the call site • If "bar“ calls a function "foo“: • bar: ... call foo • If "foo" is out-of-range: • bar: ... call foo_trampoline • The linker will generate a trampoline containing code which executes a long branch to the original callee: • foo_trampoline: • branch_long foo

  32. Using C Exception Handlers • C exception handlers can use the #pragma INTERRUPT statement. • Or use the interrupt keyword: • interrupt void func() { } • C exception handlers can call: • C functions, • assembly language functions. • If a C exception handler does not call any other function, • only those registers that are actually used in the exception handler will be saved and restored automatically (banked registers are saved by hardware). • If a C exception handler does call other functions, • it should set its own stack pointer; • the exception handler saves all of the registers not preserved by the call: R0..R3, R12, LR (R8-R12 saved by hardware for FIQ). • Exception handlers should never be called directly. • Reentrant exception handlers must save SPSR and LR.

  33. Using Assembly Exception Handlers • Assembly language exception handlers can: • use the stack; • access global C variables; • call C functions. • No special naming convention needs to be used. • All registers used by the exception handler have to be preserved by the handler (banked registers are saved by hardware). • Calling a C function, be sure that: • this exception's stack pointer is set up; • all registers not preserved by the call are saved prior to the call: R0..R3, R12, LR. • Calling an assembly language function: • be sure that this exception's stack pointer is set up. • All registers used during the call have to be preserved by the caller or the callee function.

  34. Enable/Disable Interrupts from C • New intrinsics available from C in 4.1.x compiler • Set the CPSR register: • void _set_CPSR(uint src); • void _restore_interrupts(uint src); • Inline MSR CPSR,src instruction at that point in the routine • Set CPSR flag bits • void _set_CPSR_flg(uint src); • Inline MSR CPSR_FLG,src instruction • Return CPSR register • dst = uint _get_CPSR(); • Inline MRS dst,CPSR instruction • Call software interrupt in 32-bit mode • void _call_swi(uint src) • Inline SWI #src instruction

  35. Enable/Disable Interrupts in C • Enable IRQ status bit and return old CPSR state: dst = uint _enable_IRQ(); • Inline:MRS dst,CPSR BIC tmp,dst,#0x80 MSR CPSR,tmp • Enable FIQ status bit and return old CPSR state: dst = uint _enable_FIQ(); • Inline:MRS dst,CPSR BIC tmp,dst,#0x40 MSR CPSR,tmp Available in version 4.1.x compilers

  36. Enable/Disable Interrupts in C • Enable IRQ and FIQ status bits and return old CPSR state: dst = uint _enable_interrupts(); • Inline:MRS dst,CPSR BIC tmp,dst,#0xc0 MSR CPSR,tmp • Disable IRQ status bit and return old CPSR state: dst = uint _disable_IRQ(); • Inline:MRS dst,CPSR ORR tmp,dst,#0x80 MSR CPSR,tmp Available in version 4.1.x compilers

  37. Enable/Disable Interrupts in C • Disable FIQ status bit and return old CPSR state: dst = uint _disable_FIQ(); • Inline:MRS dst,CPSR ORR tmp,dst,#0x40 MSR CPSR,tmp • Disable IRQ and FIQ status bits and return old CPSR state: dst = uint _disable_interrupts(); • Inline:MRS dst,CPSR ORR tmp,dst,#0xc0 MSR CPSR,tmp Available in version 4.1.x compilers

  38. Compiler Use Outline • Using the Compiler • Optimization • Programming Hints • New Features • Linking

  39. Optimization Overview • Significant reduction in code-size or cycles can be obtained • Optimization can either be set for smallest size or fast code • The amount of optimization performed can be controlled • To control the accuracy of debug info, for example

  40. ARM7 Code Size Comparison "-gt" "-o0" 100.00 "-o1" "-o2" 90.00 "-o3" "-o3 -oi0" 80.00 % comparison 70.00 60.00 50.00 app1 app2 app3 benchmark apps Optimization Overview

  41. ARM7 Cycle Count Comparison "-gt" "-o0" 100 "-o1" 90 "-o2" 80 "-o3" 70 "-o3 -mf" 60 % comparison 50 40 30 20 10 0 app1 app2 app3 app4 app5 app6 benchmarks Optimization Overview

  42. Optimization Levels Less reliable debug info

  43. Level 0 Simplify the control flow graph to remove unnecessary branches Register allocation, move variables to virtual registers Dead code elimination Simplify expressions: (b + 4) – (c - 1)becomes (b – c + 5) Level 1 All level 0 optimizations Basic block constant and copy propagation Remove dead assignments Remove common subexpressions Loop rotation Optimizer Overview

  44. Optimizer Overview Inlining can be controlled, or turned off with –oi0. • Level 2 • All level 1 optimizations • Loop optimizations • Hoisting for example • Remove global (function-level) common subexpressions • Replace integer division with constant divisor with multiplication by reciprocal • Level 3 • All level 2 optimizations • Automatic inlining • Remove functions that are never called • Simplify functions with return values that are not used • File level constant propagation • To replace arguments for example

  45. Code Generator Optimizations • Enabled at level 0 • Simplify control flow graph • Register allocation • Peephole optimization • Optimize use of close constant values • Delete unnecessary sign and zero extensions • Register tracking • Tail merging • Constant hoisting • Remove unnecessary compares to zero • Remove unnecessary shifts • Switch table compression • Branch chaining

  46. Recommended Switches

  47. Example – No Optimization $func: ADD SP, #-4 STR A1, [SP, #0] CMP A1, #3 BNE L1 MOV A2, #1 LDR A1, CON1 STR A2, [A1, #0] LDR A1, CON1 LDR A1, [A1, #0] B L2 L1: LDR A1, [SP, #0] CMP A1, #4 BNE L2 MOV A1, #2 LDR A2, CON1 STR A1, [A2, #0] LDR A1, CON1 LDR A1, [A1, #0] L2: ADD SP, #4 MOV PC, LR extern int a; int func(int b) { if (b == 3) { a = 1; return a; } else if (b == 4) { a = 2; return a; } } • Compiled in 16-bit mode, no optimization • 7 loads, 3 stores • .text size 44 bytes

  48. Example – No Optimization $func: ADD SP,#-4 STR A1,[SP,#0] CMP A1,#3 BNE L1 extern int a; int func(int b) { if (b == 3) { a = 1; return a; } else if (b == 4) { a = 2; return a; } } • 16-bit mode, no optimization • Control Flow Graph represents code as written • Argument “b” stored on stack …. STR A2,[A1,#0] ….. LDR A1,[A1,#0] B L2 L1: …. BNE L2 …. STR A1,[A2,#0] ….. LDR A1,[A1,#0] L2:

  49. Example - Optimization Level 0 extern int a; int func(int b) { if (b == 3) { a = 1; return a; } else if (b == 4) { a = 2; return a; } } $func: CMP A1, #3 BQNE L1 MOV A1, #1 B L2 L1: CMP A1, #4 BQNE L3 MOV A1, #2 L2: LDR A2, CON1 STR A1, [A2, #0] LDR A1, [A2, #0] L3: MOV PC, LR • Compiled in 16-bit mode, -o0 • 2 loads, 1 store • .text size 28 bytes

  50. Example - Optimization Level 0 $func: CMP A1,#3 BQNE L1 extern int a; int func(int b) { if (b == 3) { a = 1; return a; } else if (b == 4) { a = 2; return a; } } • 16-bit mode, optimization –o0 • Control Flow Graph rewritten • Update of “a” moved out of conditional blocks • Register allocation on “b” • Extra load is removed at level 1 …. MOV A1,#1 B L2 L1: …. BQNE L3 MOV A1,#2 L2: …. STR A1,[A2,#0] LDR A1,[A2,#0] L3:

More Related