Download
speculative software management of datapath width for energy optimization n.
Skip this Video
Loading SlideShow in 5 Seconds..
Speculative Software Management of Datapath-width for Energy Optimization PowerPoint Presentation
Download Presentation
Speculative Software Management of Datapath-width for Energy Optimization

Speculative Software Management of Datapath-width for Energy Optimization

104 Views Download Presentation
Download Presentation

Speculative Software Management of Datapath-width for Energy Optimization

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu 35042 Rennes Cedex, France

  2. Context Embedded applications use to operate on 8-/16-bit data > 50% of program instructions in some case New opportunities for energy reduction … clock-gating at finer granularity, i.e. operand level

  3. Dynamic approach Compiler approach Exploiting narrow-width operands 1. cycle-by-cycle operand gating 1. based on static data flow analysis 2. complex hardware mechanisms required 2. must be overly conservative to preserve program correctness Brooks, et al. HPCA-99 Stephenson, et al. PLDI 2000

  4. Don’t want to pay the cost of a hardware scheme to detect when to clock-gate Don’t want to rely on static data flow analysis to discover bit-width ranges Our approach Dynamic approach Compiler approach Use compiler approach to switch from normal to narrow-width mode and vice-versa (via a reconfiguration instruction) Take advantage of dynamic approach to expose dynamic narrow-width operands to the compiler (via profiling) narrow-width execution mode is speculative : exception management allows to recover to the correct mode

  5. Bit-width distribution analysis • Cumulative distribution [Powerstone benchmarks] one operand two operands Narrow-width operands occurrence

  6. Bit-width distribution analysis • Dynamic distribution of narrow-width operands at basic block level (adpcm)

  7. Outline • Motivation • Micro-architectural support • Narrow-width regions formation • Simulation platform • Evaluation • Conclusions

  8. Register file model • We address a new dimension: • reduce register file activity by reducing register file width • Prior work to reduce the energy consumption in register file • limited port connectivity • limited number of registers Slice enable signal Tag bits 8bits 16bits 8bits 01 00110110 00110110 00110110 11000011 11 11110110 11110110 01 10010110 • We propose the byte-slice register file approach Row decoder 1. logically splitted 2. low-power mode via drowsy technique (allows to preserve register cells content) Flautner et al. ISCA-29 32bits

  9. Reconfigurable data-path • data-path resizable to accommodate to the bit-width execution mode (via clock-gating) • pipeline latches • ALU • clock-gating at coarser granularity Write-back (8/16/32 mode) Slice-enable signal (8/16/32 mode) Bypass (8/16/32 mode) ALU LSU (8/16/32 mode) (8/16/32 mode)

  10. Exception management • Data-path width misprediction may occur due to a dynamic event • Simple recovery scheme • the tag bits indicate the true data-width • upon a misprediction: • trigger an exception • recover to the correct execution mode

  11. Address instructions • Special care must be taken with address instructions • separate address calculation from memory access • Use of dedicated registers for address computation • accumulator registers with additional ISA support (see paper for details)

  12. Outline • Motivation • Micro-architectural support • Narrow-width regions formation • Simulation platform • Evaluation • Conclusions

  13. A two steps process input data sets annotated .s file machine Step 1 modified .s file Step 2 annotated .s file address transformation

  14. Profiling • Bit-width characteristics of selected regions 32 bits other LD/ST with 32 bits 8/16 bits weight of regions in program 100% 80% 60% Narrow-width operands 40% 20% 0%

  15. Address instructions transformation • A graph partitioning formulation: • G, DDG of a BB • iff there is def-use relation between n and m • Problem transform memory instructions into equivalent accumulator-based instructions Select (n,m) such that n has a 32-bit width operand and m is a LD/ST instr add1 add1 add -> Rx load mov Rx -> ACC Replace m with accumulator-based instructions Minimize cut-size, number of instructions to move data from regfile to accumulators add2 LDACC Ry add2

  16. Instructions reordering • Problem: • reorder instructions in a basic block such that operations with 32-bits operands are move around 8/16 bits operations

  17. Outline • Motivation • Micro-architectural support • Narrow-width regions formation • Evaluation • Conclusions

  18. Simulation platform • Tools • CACTI : register file energy access • HotLeakage: leakage energy • Lx processor platform • in-order • 4-issue width • 64 32-bit GPR • 8 1-bit CBR • 6 stages pipeline • 4 ALUs, 1 LSU • 2 MULs

  19. Summary of results • IPC degradation with varying misprediction penalty and varying bit-width convergence

  20. Summary of results • Dynamic energy reduction

  21. Summary of results • Register file static energy savings

  22. Outline • Motivation • Micro-architectural support • Narrow-width regions formation • Evaluation • Conclusions

  23. Conclusions • Contribution to power-aware compilation • speculative management of processor data-path in software • simple exception management scheme to repair a software misprediction • Evaluation results • 17% data-path dynamic energy savings • 22% register file static energy savings • performance impact varies with implementation cost of the recovery scheme • Future work • evaluation with larger granularity (e.g. trace) • can reduce number of mispredictions • can reduce amount of reconfiguration instructions

  24. Thanks ! Questions …