1 / 41

Osman Sabri Ü nsal Barcelona Supercomputing Center Euro-TM Final Workshop, January 2015

Energy-efficient Transactional Memory. Osman Sabri Ü nsal Barcelona Supercomputing Center Euro-TM Final Workshop, January 2015. Terminology. Power Dynamic P = E trans α f ck = α f ck CV 2 /2 Clock gating, DVFS Static (leakage) Difficult to model, increases with T, area

davidpbrown
Download Presentation

Osman Sabri Ü nsal Barcelona Supercomputing Center Euro-TM Final Workshop, January 2015

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Energy-efficient Transactional Memory Osman SabriÜnsalBarcelona Supercomputing Center Euro-TM Final Workshop, January 2015

  2. Terminology • Power • Dynamic • P= Etransα fck = α fck CV2/2 • Clock gating, DVFS • Static (leakage) • Difficult to model, increases with T, area • Power gating • Energy • Power over time • Metrics: Energy, Energy-delay, Energy-delay2

  3. History • In thebeginningtherewas Maurice (and Tali and Iris) Energy Reduction in Multiprocessor Systems Using Transactional Memory TaliMoreshet, Iris Bahar, Maurice Herlihy ISLPED2005 • Others – manyfrom Euro-TM

  4. This talk • In thebeginning, therewas Maurice Energy Reduction in Multiprocessor Systems Using Transactional Memory TaliMoreshet, Iris Bahar, Maurice Herlihy ISLPED2005 • Clock-gateonAbort ClockGateonAbort: TowardsEnergy-Efficient Hardware TransactionalMemory SutirthaSanyal, Sourav Roy, Adrián Cristal, OsmanUnsal,MateoValero IPDPS2009 • Belowsafe-Vddwith TM Combining Error Detection and TransactionalMemoryforEnergy-efficient Computing belowSafeOperationMargins Gulay Yalcin, Anita Sobe, AlexeyVoronin, Jons-TobiasWamhoff, DerinHarmanci, Adrián Cristal, OsmanUnsal, Pascal Felber, ChristofFetzer PDP2014

  5. ISLPED2005 at a glance • Using a fully-associative transactional cache • Running a microbenchmark and comparing to locks • Simple energy-efficient heuristic; serialize on conflict

  6. ISLPED2005 at a glance (cont.) • Power calculated through CACTI and MicronSDRAM power calculator

  7. This talk • In thebeginning, therewas Maurice Energy Reduction in Multiprocessor Systems Using Transactional Memory TaliMoreshet, Iris Bahar, Maurice Herlihy ISLPED2005 • Clock-gateonAbort ClockGateonAbort: TowardsEnergy-Efficient Hardware TransactionalMemory SutirthaSanyal, Sourav Roy, Adrián Cristal, OsmanUnsal,MateoValero IPDPS2009 • Belowsafe-Vddwith TM Combining Error Detection and TransactionalMemoryforEnergy-efficient Computing belowSafeOperationMargins Gulay Yalcin, Anita Sobe, AlexeyVoronin, Jons-TobiasWamhoff, DerinHarmanci, Adrián Cristal, OsmanUnsal, Pascal Felber, ChristofFetzer PDP2014

  8. Futile Aborts • Aborting more than “once” is waste of energy. • n Aborts -> n wasted executions. These are Futile Aborts. • Clock-Gate to halt processor receiving Abort.

  9. Clock Gating • Well known energy saving technique.

  10. Scalable-TCC • Assumes directory instead of shared bus. • Multiple Directories -> Parallel Commit • Sharers Abort

  11. Contributions • A novel protocol on top of SC-TCC which dynamically gate and ungate processors to save energy. • A new contention management model to support frequent gating -> maximize energy savings.

  12. Proposal • Keep the aborted processor clock-Gated if: • a) The aborter thread is still present in that directory. • AND • b) If the aborter thread is executing the same transaction which earlier killed the abortee transaction.

  13. Changes in Directory Aborter Proc Id = The Processor doing the Commit. Aborter Tx Id = PC of the Tx causing this Abort. Abort Counter = The number of aborts suffered so far (in this directory). Renew Counter = Number of renewals of clock gating sessions. Gate Timer = Duration of Clock-Gating.

  14. Protocol Operations • Assume a NUMA configuration such as shown above. • P2 and P3 can commit in parallel. • Assume, P0 is committing. Others will be invalidated.

  15. Protocol Operations (cont..) • P1 Gated. • Timer Value Loaded. TxP0

  16. Gating Period • Proposed a new exponential back-off to obtain energy saving along with a performance gain. • Idea is to gate frequently at low abort, and increase exponentially as Abort (and/or Renewal) goes up -> Exponential Double Stair-Case.

  17. Reminder • Keep the aborted processor clock-Gated if: • a) The aborter thread is still present in that directory. • AND • b) If the aborter thread is executing the same transaction which earlier killed the abortee transaction.

  18. Protocol Operations (cont..) • Expiration of Timer -> Processor may still be gated.

  19. Renewal Of Gating In this case, i) Processor remained turned-off. ii) Renew Count incremented by 1. iii) New value for Gating period loaded.

  20. Simulation and Results • Simulator – M5. • Benchmark – 3 applications from STAMP. Genome, Yada and Intruder. Average Speed-Up – 4%

  21. Simulation and Results Average Energy-Savings – 19% Average Power-Savings – 13%

  22. This talk • In thebeginning, therewas Maurice Energy Reduction in Multiprocessor Systems Using Transactional Memory TaliMoreshet, Iris Bahar, Maurice Herlihy ISLPED2005 • Clock-gateonAbort ClockGateonAbort: TowardsEnergy-Efficient Hardware TransactionalMemory SutirthaSanyal, Sourav Roy, Adrián Cristal, OsmanUnsal,MateoValero IPDPS2009 • Belowsafe-Vddwith TM Combining Error Detection and TransactionalMemoryforEnergy-efficient Computing belowSafeOperationMargins Gulay Yalcin, Anita Sobe, AlexeyVoronin, Jons-TobiasWamhoff, DerinHarmanci, Adrián Cristal, OsmanUnsal, Pascal Felber, ChristofFetzer PDP2014

  23. Dark Silicon Phenomenon • Number of transistors can be increased. • In order to stay within a chip’s power budget, some must remain “dark”. • One solution: Downscale the voltage. • Go below safe voltage limit

  24. How about Reliability? When the Vdd is reduced, the error rate increases exponentially [1]. Our goal is: Investigating the edge cases on voltage reduction while the error recovery still leads to a reduced energy consumption. [1] Dan Ernst et al. “Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation.” In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, pages 7–18, 2003

  25. Agenda / Overview • Motivation • Experiment: Scaling Vdd in a Real System • Basics of Reliability • Error Recovery with TM • Error Detection Schemes • Analysis • Conclusion

  26. Reducing Vdd in a Real System • AMD FX-6100 • 6-core CPU • CPU-heavy execution • Every 10 seconds reduce Vdd by 12.5mV • Monitor • Incorrect Result • System Crash • Machine Check Architecture Errors are ininstruction cache (37%), execution unit (61%) and others (less than 2%). The system encounters errors which can not be corrected by MCA even only after 10% reduction in Vdd

  27. Basics of Reliability Transactional Memory can provide a lightweight Coordinated Local Checkpoitning [2] [2] Gulay Yalcin et al. “FaulTM: Fault Tolerance Using Hardware Transactional Memory , DATE 2013

  28. TM provides checkpointing/rollback Pn P4 Processor 1 P3 P2 Synchronize checkpoints Checkpoint (Log Area) Checkpoint (Log Area) Checkpoint (Log Area) Checkpoint (Log Area) Checkpoint (Log Area) Data-Versioning provides a synchronization mechanism between checkpoints. TM write-sets log the tentative memory updates.

  29. Error Detection Schemes - Replication Execute instruction streams multiple times Compare the results of executions Less comparison with TM. Dual/Triple Modular Redundancy + High Error Detection Rate - High Energy Overhead

  30. Error Detection Schemes-Assertions/Invariants • Assertions: Conditions referring to the current and previous state of the program. • Check the state • Adding manually or automatic • TM facilitates inserting invariants • Ex:

  31. Error Detection Schemes - Symptoms • Monitor program executions to inspect if there is a symptom of hardware faults. • Symptoms: • Mispredictions in high confidence branches, • high OS activity, • fatal traps (e.g. undefined instruction code) • Reliability at a low cost

  32. Error Detection Schemes- Encoded Processing • Apply software coding (ECC-like) techniques • The redundancy is added by applying arithmetic codes to the values. • With TM, the validation of a code word can be deferred until a TX commits. • Ex:

  33. Comparing Error Detection Schemes

  34. Analysis • Gem5 full system simulator • 1GHz in-order cores • 4 cores • X86 ISA • 64KB L1 data and instruction caches • Unified 2MB L2 cache • SPLASH2 benchmark suite.

  35. Energy Analysis Error Detection Rate Vdd Fault Injection TX size Recovery Overhead E ≈ C x Vdd2 Error-free Overhead

  36. Energy Reduction

  37. Conclusion • The energy consumption of CPUs can be reduced if we have efficient hardware support for Transactional Memory and for Error Detection.

  38. History (revisited) - Energy Reduction in Multiprocessor Systems Using Transactional Memory Moreshet et al., ISLPED2005 - ClockGateonAbort: TowardsEnergy-EfficientHardware TransactionalMemory Sanyal et al., IPDPS2009 - Energy-Performance Tradeoffs in Software Transactional Memory Balhassin et al., SBACPAD2012 - Dynamic Serialization: Improving Energy Consumption in Eager-Eager Hardware Transactional Memory Systems Gaona et al., PDP2012 - Energy Efficient GPU Transactional Memory via Space-Time Optimizations Fung et al., MICRO2013 - CombiningError Detection and TransactionalMemoryforEnergy-efficient Computing belowSafeOperationMargins Yalcin et al., PDP2014 - Performance and Energy Analysis of the Restricted Transactional Memory Implementation on Haswell Goel et al., IPDPS2014 - On the Energy and Performance of Commodity Hardware Transactional Memory Diegues et al., SIGMETRICS2014

  39. Thanks to • Gulay Yalcin • SutirthaSanyal • Anita Sobe • AlexeyVoronin • Jons-TobiasWamhoff • DerinHarmanci • Adrian Cristal • ChristofFetzer • Pascal Felber

  40. Switching Between Modes

More Related