1 / 19

Hardware Architectures for Power and Energy Adaptation

Hardware Architectures for Power and Energy Adaptation. Phillip Stanley-Marbell. Outline. Motivation Related Research Architecture Experimental Evaluation Extensions Summary and Future work. Motivation.

Download Presentation

Hardware Architectures for Power and Energy Adaptation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

  2. Outline • Motivation • Related Research • Architecture • Experimental Evaluation • Extensions • Summary and Future work

  3. Motivation • Power consumption is becoming a limiting factor with scaling of technology to smaller feature sizes • Mobile/battery-powered computing applications • Thermal issues in high end servers • Low Power Design is not enough: • Power- and Energy-Aware Design • Adapt to non-uniform application behavior • Only use as many resources as required by application • This talk : Exploit processor-memory performance gap to save power, with limited performance degradation

  4. Related Research • Reducing power dissipation in on-chip caches • Reducing instruction cacheleakage power dissipation [Powell et al, TVLSI ‘01] • Reducing dynamic power in set-associative caches and on-chip buffer structures [Dropsho et al, PACT ‘02] • Reducing power dissipation of CPU core • Compiler-directed dynamic voltage scaling of CPU core[Hsu, Kremer, Hsiao. ISLPED ‘01]

  5. Single-issue in-order processors • Limited overlap of main memory access and computation CPU @ Vdd CPU @ Vdd/2 Target Application Class: Memory-Bound Applications • Memory-bound applications • Limited by memory system performance

  6. Power-Performance Tradeoff • Detect memory-bound execution phases • Maintain sufficient information to determine compute / stall time ratio • Pros • Scaling down CPU core voltage yields significant energy savings (Energy  Vdd2) • Cons • Performance hit (Delay  Vdd)

  7. Power Adaptation Unit (PAU) • Maintains information to determine ratio of compute to stall time • Entries allocated for instructions which cause CPU stalls • Intuitively, one table entry required per program loop • Fields: • State (I, A, T, V) • # instrs. executed (NINSTR) • Distance b/n stalls (STRIDE) • Saturating ‘Quality’ counter (Q) [From S-M et al, PACS 2002]

  8. Slowdown factor, ∂, for a target 1% performance degradation: 0.01 • STRIDE + NINSTR ∂ = NINSTR PAU Table Entry State Machine If CPU at-speed, slow it down

  9. for (x = 100;;) { if (x- - > 0) a = i; b = *n; c = *p++; } PAU table entries created for each assignment After 100 iterations, assignment to a stops Entries for b or c can take over immediately Example

  10. Experimental Methodology • Simulated PAU as part of a single-issue embedded processor • Used Myrmigki simulator [S-M et al, ISLPED 2001] • Models Hitachi SH RISC embedded processor • 5 stage in-order pipeline • 8K unified L1, 100 cycle latency to main memory • Empirical instruction power model, from SH7708 device • Voltage scaling penalty of 1024 cycles, 14uJ • Investigated effect of PAU table size on performance, power • Intuitively, PAU table entries track program loops with repeated stalls

  11. Effect of Table Size on Energy Savings • Single-entry PAU table provides 27% reduction in energy, on average • Scaling up to a 64-entry PAU table only provides additional 4%

  12. Effect of Table Size on Performance • Single-entry PAU table incurs 0.75% performance degradation, on avg. • Large PAU table, leads to more aggressive behavior, increased penalty

  13. Overall Effect of Table Size : Energy-Delay product • Considering both performance and power, there is little benefit from larger PAU table sizes

  14. Extending the PAU structure • Multiprogramming environments • Superscalar architectures • Slowdown factor computation

  15. PAU in Multiprogramming Environments • Only a single entry necessary per application • Amortize mem.-bound phase detection • Would be wasteful to flush PAU at each context switch (~10ms) • Extend PAU entries with an ID field: • CURID and IDMASK fields written to by OS

  16. CPU @ Vdd CPU @ Vdd/2 PAU in Superscalar Architectures • Dependent computations are ‘stretched out’ • FUs with no dependent instructions unduly slowed down • Maintain separate instruction counters per FU: Drawback : Requires ability to run FUs in core at different voltages

  17. Slowdown factor computation • Computation only performed on application phase change • Hardware solution would be wasteful • Solution : computation by software ISR • Compute ∂ , lookup discrete Vdd/Freq. by indexing into a lookup table • Similar software handler solution proposed in [Dropsho et al, 2002]

  18. Summary & Future Work • PAU : Hardware identifies program regions (loops) with compute / memory stall mismatch • Due to nature of most programs, even single entry PAU is effective : can achieve 27% energy savings with only 0.75% perf. Degradation • Proposed extensions to PAU architecture • Future work • Evaluations with smaller miss penalties • Implementation of proposed extensions • More extensive evaluation of implications of applications

  19. Questions

More Related