1 / 30

Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File. Stephen Hines , Gary Tyson, and David Whalley Computer Science Dept. Florida State University June 8-16, 2007. Instruction Packing.

gary
Download Presentation

Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Addressing Instruction Fetch Bottlenecksby Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida State University June 8-16, 2007

  2. Instruction Packing • Store frequently occurring instructions as specified by the compiler in a small, low-power Instruction Register File (IRF) • Allow multiple instruction fetches from the IRF by packing instruction references together • Tightly packed – multiple IRF references • Loosely packed – piggybacks an IRF reference onto an existing instruction • Facilitate parameterization of some instructions using an Immediate Table (IMM) Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  3. insn3 insn4 insn2 insn3 insn2 insn4 imm3 imm3 IRF IMM Instruction Cache insn1 insn1 Execution of IRF Instructions Instruction Fetch Stage First Half of Instruction Decode Stage IF/ID PC packed instruction packed instruction To Instruction Decoder IRWP Executing a Tightly Packed Param4c Instruction Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  4. Outline • Introduction • IRF and Instruction Packing Overview • Integrating an IRF with an L0 I-Cache • Decoupling Instruction Fetch • Experimental Evaluation • Related Work • Conclusions & Future Work Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  5. 6 bits 5 bits 5 bits 5 bits 5 bits 1 bit 5 bits opcode inst4param inst5param 6 bits 5 bits 5 bits 5 bits 6 bits 5 bits opcode rs shamt 6 bits 5 bits 5 bits 11 bits 5 bits opcode 6 bits 2 bits 24 bits opcode MIPS+IRF Instruction Formats inst1 inst2 inst3 s T-type rt rd function inst R-type rs rt immediate inst I-type win immediate J-type Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  6. Previous Work in IRF • Register Windowing + Loop Cache (MICRO 2005) • Compiler Optimizations (CASES 2006) • Instruction Selection • Register Renaming • Instruction Scheduling Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  7. Integrating an IRF with an L0 I-Cache • L0 or Filter Caches • Small and direct-mapped • Fast hit time • Low energy per access • Higher miss rate than L1 • 256B L0 I-cache 8B line size [Kin97] • Fetch energy reduced 68% • Cycle time increased 46%!!! • IRF reduces code size, while L0 only focuses on energy reduction at the cost of performance • IRF can alleviate performance penalty associated with L0 cache misses, due to overlapping fetch Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  8. L0 Cache Miss Penalty L0 Cache Miss Cycle 1 2 3 4 5 6 7 8 9 Insn1 IF ID EX M WB Insn2 IF ID EX M WB Insn3 IF ID M EX WB Insn4 IF ID EX M WB Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  9. Overlapping Fetch with an IRF L0 Cache Miss Cycle 1 2 3 4 5 6 7 8 9 Insn1 IF ID EX M WB Pack2a IFab IDa EXa Ma WBa Pack2b IDb EXb Mb WBb Insn3 IF ID EX M WB Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  10. Decoupling Instruction Fetch • Instruction bandwidth in a pipeline is usually uniform (fetch, decode, issue, commit, …) • Artificially limits the effective design space • Front-end throttling improves energy utilization by reducing the fetch bandwidth in areas of low ILP • IRF can provide virtual front-end throttling • Fetch fewer instructions every cycle, but allow multiple issue of packed instructions • Areas of high ILP are often densely packed • Lower ILP for infrequently executed sections of code Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  11. Out-of-order Pipeline Configurations Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  12. Experimental Evaluation • MiBench embedded benchmark suite – 6 categories representing common tasks for various domains • SimpleScalar MIPS/PISA architectural simulator • Wattch/Cacti extensions for modeling energy consumption (inactive portions of pipeline only dissipate 10% of normal energy when using cc3 clock gating) • VPO – Very Portable Optimizer targeted for SimpleScalar MIPS/PISA Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  13. L0 Study Configuration Data Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  14. Execution Efficiency for L0 I-Caches Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  15. Energy Efficiency for L0 I-Caches Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  16. Decoupled Fetch Configurations Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  17. Execution Efficiency for Asymmetric Pipeline Bandwidth Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  18. Energy Efficiency for Asymmetric Pipeline Bandwidth Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  19. Energy-Delay2 for Asymmetric Pipeline Bandwidth Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  20. Related Work • L-caches – subdivide instruction cache, such that one portion contains the most frequently accessed code • Loop Caches – capture simple loop behaviors and replay instructions • Zero Overhead Loop Buffers (ZOLB) • Pipeline gating / Front-end throttling – stall fetch when in areas of low IPC Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  21. Conclusions and Future Work • Future Topics • Can we pack areas where L0 is likely to miss? • IRF + encrypted or compressed I-Caches • IRF + asymmetric frequency clustering (of pipeline backend functional units) • IRF can alleviate fetch bottlenecks from L0 I-Cache misses or branch mispredictions • Increased IPC of L0 system by 6.75% • Further decreased energy of L0 system by 5.78% • Decoupling fetch provides a wider spectrum of design points to be evaluated (energy/performance) Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  22. The End Questions ??? Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  23. Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  24. Energy Consumption Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  25. Static Code Size Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  26. Conclusions & Future Work • Compiler optimizations targeted specifically for IRF can further reduce energy (12.2%15.8%), code size (16.8%28.8%) and execution time • Unique transformation opportunities exist due to IRF, such as code duplication for code size reduction and predication • As processor designs become more idiosyncratic, it is increasingly important to explore the possibility of evolving existing compiler optimizations • Register targeting and loop unrolling should also be explored with instruction packing • Enhanced parameterization techniques Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  27. Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  28. Instruction Redundancy • Profiled largest benchmark in each of six MiBench categories • Most frequent 32 instructions comprise 66.5% of total dynamic and 31% of total static instructions Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  29. Compilation Framework Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

  30. Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File

More Related