1 / 20

Compilers for embedded systems: Why are compilers an issue?

Compilers for embedded systems: Why are compilers an issue?. Many reports about low efficiency of standard compilers Special features of embedded processors have to be exploited. High levels of optimization more important than compilation speed.

Download Presentation

Compilers for embedded systems: Why are compilers an issue?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compilers for embedded systems:Why are compilers an issue? • Many reports about low efficiency of standard compilers • Special features of embedded processors have to be exploited. • High levels of optimization more important than compilation speed. • Compilers can help to reduce the energy consumption (energy optimization). • Compilers could help to meet real-time constraints. • Less legacy problems than for PCs. • There is a large variety of instruction sets. • Design space exploration for optimized processors makes sense

  2. Use of assembly languages in embedded systems C-Code [Paulin, 1995] (DSP) C-Code (µController) Assembler (DSP) Assembler (µController) Similar situation more recently

  3. Optimizations considered • Energy-aware compilation • Compilation for digital signal processors • Compilation for multimedia processors • Compilation for VLIW (very long instruction word) processors • Compilation for network processors • Compiler generation, retargetable compilers and design space exploration

  4. Efforts for Reducing Energy • Device Level • Development of Low Power Devices • Reducing Power Supply Voltage • Reducing Threshold Voltage • Circuit Level • Gated Clock • Path Transistor Logic • Asynchronous Circuits • System Level • Fabrication level

  5. Optimization for low-energy the same as optimization for high performance? • No ! • High-performance if available memory bandwidth fully used; low-energy consumption if memories are at stand-by mode • Reduced energy if more values are kept in registers ADD r3,r0,r2 MOV r0,#28 MOV r2,r12 MOV r12,r11 MOV r11,rr10 MOV r0,r9 MOV r9,r8 MOV r8,r1 LDR r1, [r4, r0] ADD r0,r3,r1 ADD r4,r4,#4 ADD r5,r5,#1 CMP r5,#100 BLT LL3 LDR r3, [r2, #0] ADD r3,r0,r3 MOV r0,#28 LDR r0, [r2, r0] ADD r0,r3,r0 ADD r2,r2,#4 ADD r1,r1,#1 CMP r1,#100 BLT LL3 int a[1000]; c = a; for (i = 1; i < 100; i++) { b += *c; b += *(c+7); c += 1; } 2231 cycles 16.47 µJ 2096 cycles 19.92 µJ

  6. Energy models • Commercial tools frequently very imprecise • Model of Tiwari (Dissertation, Princeton 1996):Cost of instructions and of transitions between instructions;Does not separate out the cost of memory access • Model of Simunic, de Micheli (DAC 99):Model based on data sheets; does not require measurements.Does not take transitions into account. • Russell, Jacome (ICCD, 1998): based on precise measurement for two fixed configurations;cannot predict effect of changes to memory architecture. • Lee (LCTES 2001): detailed analysis of the effect pipeline stages; does not include multi-cycle operations and stalls • Dedicated energy models.

  7. Energy Model by Knauer and Steinke VDD ARM7 DataMemory DAddr ALU Register File Data RegValue Reg# BarrelShifter IAddr InstructionMemory Imm Instr Instr Opcode Multi-plier Instr. Decoder& Control Logic Etotal = Ecpu_instr + Ecpu_data + Emem_instr + Emem_data

  8. Instruction dependent costs in the CPU • Ecpu_instr = • MinCostCPU(Opcodei) +1 *  w(Immi,j) + ß1 *  h(Immi-1,j, Immi,j) +2 *  w(Regi,k) + ß2 *  h(Regi-1,k, Regi,k) + • 3 *  w(RegVali,k) + ß3 *  h(RegVali-1,k, RegVali,k) + • 4 *  w(IAddri) + ß4 *  h(IAddri-1, IAddri) + • FUCost(Instri-1,Instri) Cost for a sequence of m instructions w: number of ones;h: Hamming distance;FUCost: cost of switching functional units, ß: determined through experiments

  9. Other costs • Ecpu_data =  5 * w(DAddri) + ß5 * h(DAddri-1, DAddri) +6 * w(Datai) + ß6 * h(Datai-1, Datai) Emem_instr = MinCostMem(InstrMem,Word_widthi) +7 * w(IAddri) + ß7 * h(IAddri-1, IAddri) +8 * w(IDatai) + ß8 * h(IDatai-1, IDatai) Emem_data =  MinCostMem (DataMem, Direction, Word_widthi) + 9 * w(DAddri) + ß9 * h(DAddri-1, DAddri) +10 * w(Datai) + ß10 * h(Datai-1, Datai)

  10. Results • It is not important, which address bit is set to ‘1’ • The number of ‚1‘s in the address bus is irrelevant • The cost of flipping a bit on the address bus is independent of the bit position. • It is not important, which data bit is set to ‘1’ • The number of ‚1‘s on the data bus has a minor effect (3%) • The cost of flipping a bit on the data bus is independent of the bit position.

  11. Compiler optimizations for improving energy efficiency • Energy-aware scheduling • Energy-aware instruction selection • Operator strength reduction: e.g. replace * by + and << • Minimize the bitwidth of loads and stores • Standard compiler optimizations with energy as a cost function R2:=a[0];for i:= 1 to 10 do begin R1:= a[i]; C:= 2 * R1 + R2; R2 := R1; end; E.g.: Register pipelining: for i:= 0 to 10 do C:= 2 * a[i] + a[i-1]; • Exploitation of the memory hierarchy

  12. 3 key problems for future memory systems • (Average) Speed • Energy/Power • Predictability/Worst Case Execution Time Energy Access times smaller, faster, less energy

  13. 1. (Average) Speed • Speed gap between processor and main DRAM increases Speed • early 60ties (Atlas):page fault ~ 2500 instructions • 2002 (2 GHz µP):access to DRAM ~ 500 instructions •  penalty for cache miss soon same as for page fault in Atlas 8 4 CPU (1.5-2 p.a.)  2x every 2 years 2 DRAM (1.07 p.a.) [P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane] 1 years 0 1 2 3 4 5

  14. 2. Power/Energy Example (CACTI Model): 2.5 2 1.5 Energy per access [nJ] 1 0.5 0 64 128 256 512 1024 2048 4096 8192 Memory size [bytes] [Steinke et al., Inf 12, UniDo, 2002]

  15. 3. Predictability/WCET • Predictability: For satisfying timing constraints in hard real-time systems, predictability is the most important concern;pre run-time scheduling is often the only practical means of providing predictability in a complex system [Xu, Parnas] • Time-triggered, statically scheduled operating systems • What about memory accesses? • Currently available caches don‘t solve the problem: • Improve the average case behavior • Use “non-deterministic“ cache replacement algorithms • Scratch-pad/tightly coupled memory based predictability

  16. main SPM processor 0 scratch pad memory ARM7TDMI cores, well-known for low power consumption FFF.. Hierarchical memoriesusing scratch pad memories (SPM) • Address space Hierarchy Example no tag memory

  17. For i .{ } for j ..{ } while ... Repeat call ... Example: Which segment (array, loop, etc.) to be stored in SPM? Gain gi and size si for each segment i. Maximise gain G = gi, respecting constraint K  si. Static memory allocation: Solution: knapsack algorithm. Dynamic reloading: Finding optimal reloading points. board Main memory (On-board) Array ... ? SPMcapacity K Array Processor Int ... Exploitation of SPM

  18. Reduction in energy and average run-time Cycles Multi_sort (mix of sort algorithms)

  19. Energy consumption per functional unit,as a function of the SPM size • Parameters different from previous slide

  20. Hardware-support for block-copying • The DMA unit was modeled in VHDL, simulated, synthesized. Unit only makes up 4% of the processor chip. • The unit can be put to sleep when it is unused. DMA MEMORY CPU • Code size reductions of up to 23% for a 256 byte SPM were determined using the DMA unit instead of the dynamic approach that uses processor instructions for copying.

More Related