1 / 21

CS 7810 Lecture 13

CS 7810 Lecture 13. Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998. Cost of Speculation. Mispredict rates . 9.9. 12.2. 23.9. 10.4. 6.9. 4.6. 11.3. 1.7. Pipeline Gating.

aldon
Download Presentation

CS 7810 Lecture 13

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998

  2. Cost of Speculation Mispredict rates  9.9 12.2 23.9 10.4 6.9 4.6 11.3 1.7

  3. Pipeline Gating • Low confidence branches throttle instr fetch until they are resolved • Pipeline gating usually lasts for fewer than five cycles

  4. Metrics • SPEC (specificity): fraction of all mispredicted • branches detected as low-confidence by the • confidence estimator (coverage) • PVN (predictive value of a negative test): probability • of a low-confidence branch being incorrectly • branch-predicted (accuracy)

  5. Confidence Estimators • Perfect: to gauge potential benefits • Static: branches that have low prediction rates • JRS: if a branch has yielded N successive correct • predictions, it has high confidence • Saturating counters: unbiased counter value or • disagreement in two predictors  low confidence • Distance: mpreds are clustered, hence the first 4 • branches after a mispredict have low confidence

  6. SPEC and PVN SPEC (coverage): mispred branches detected by low-confidence estimator PVN (accuracy): % of low-confidence branches that are branch mpreds • It is easier to achieve a high SPEC value than PVN • A high PVN value can be achieved by using N low-confidence branches • to invoke gating – if PVN is 30%, re-defining low-confidence as two • low-confidence branches increases PVN to 51%

  7. Perfect

  8. Gating Results

  9. Results • Can gating improve performance? – only if cache • pollution is significant • Less than 1% performance loss and up to 38% • reduction in extra work • Energy consumption could go up – some work is • independent of number of executed instrs (clock • distribution) – incr. execution time can incr. Energy • Pipeline gating should reduce power consumption

  10. Results

  11. CS 7810 Lecture 13 Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power S. Kaxiras, Z. Hu, M. Martonosi Proceedings of ISCA-28 July 2001

  12. Leakage Power Trends • Circuit delay a 1/(V – Vth) • Leakage a num transistors (incr) • supply voltage (decr) • (exp) low thresh. voltage (incr) • L1 and L2 caches are the biggest • contributors (high transistor budgets)

  13. Vdd-Gating • Leakage can be reduced by gating off the • supply voltage to the circuit • When applied to a cache, the contents of the • SRAM cell are lost • Cache decay: apply Vdd-gating when you do not • care about cache contents

  14. Lifetime of a Cache Line

  15. Overheads • Hardware to determine when to decay • Introduces additional cache misses • Normalized cache leakage power = • Activeratio (fraction of cache that is powered on) + • (Counter overhead : Leak) x activity + • (L2 access energy : Leak) x num-misses • Increased execution time (< 0.7%) • L2 access/leakage ratio is ~9

  16. Skier’s Dilemma New skis: $400 Ski rentals: $20 Heuristic: Buy skis after rental cost = purchase price Ski trips: 5 10 15 20 25 50 Optimal: $100 $200 $300 $400 $400 $400 Heuristic: $100 $200 $300 $800 $800 $800 Likewise, decay a cache line when the cost of an additional miss equals leakage dissipated so far

  17. Tracking Dead Time • Each line has a 2-bit counter that gets reset on • every access and gets incremented every 2500 • cycles through a global signal (negligible overhead) • After 10,000 clock cycles, the counter reaches • the max value and triggers a decay • Adaptive decay: Start with a short decay period; • if you have a quick miss, double the period; if there • is no miss, halve the period

  18. Results

  19. Overheads

  20. Other Results • L2 cache is equally suitable to decay techniques • -- lifetimes are scaled by a factor of 10, an extra • miss also costs a lot more • For their experiments, there is little interference • from multiprogramming • Some instructions can easily be identified as • last touches to a cache block – potential for early • cache decay • Can this apply to bpred, register file?

  21. Title • Bullet

More Related