1 / 14

Evaluating and Designing Energy-Aware Transactional Memory for Embedded Multicore Systems

Evaluating and Designing Energy-Aware Transactional Memory for Embedded Multicore Systems. Cesare Ferri, R. Iris Bahar, Maurice Herlihy Brown University, Providence, RI Cesare_Ferri@brown.edu. Task ID: 1983.001. MOTIVATION. Average laptop: 2Ghz CPU 2GB DRAM. Evo 4g specs: 1Ghz CPU

umeko
Download Presentation

Evaluating and Designing Energy-Aware Transactional Memory for Embedded Multicore Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating and Designing Energy-Aware Transactional Memory for Embedded Multicore Systems Cesare Ferri, R. Iris Bahar, Maurice Herlihy Brown University, Providence, RI Cesare_Ferri@brown.edu • TaskID: 1983.001

  2. MOTIVATION • Average laptop: • 2Ghz CPU • 2GB DRAM • Evo 4g specs: • 1Ghz CPU • 512MB DRAM • High-end embedded systems are becoming ubiquitous • These specialized embedded systems may eventually displace many general-purpose systems • Like GP systems, embedded systems are turning increasingly to multicore architectures (see ITRS/system drivers) • Increasing overall performance • Power efficiency • How can these embedded systems efficiently manage concurrent activities?

  3. MULTIPROCESSOR SYNCHRONIZATION • Old School: Lock-based Synchronization • Performance and Reliability Problems • No concurrency inside critical sections • Deadlock, livelock etc.. • New Approach: Transactional Memory (TM) • Higher level of abstraction • Boosting performance (fine-grained concurrency) • Reliable (automatic conflict resolution) • Basic Principle: • Transactions run in isolation • Rollback in case of data conflicts

  4. Transactional memory • Current Research on TM • Focused on General Purpose Processors • Performance and Programmability main concerns • Embedded Platforms have received little or no attention • Does it make sense for Embedded Systems? • Many embedded applications show high degree of parallelism… • Potential advantages of improving performance • Energy consumptionand simplicityoverriding concerns

  5. Embedded TM Design Exploration • Evaluate Embedded TM under 3 criteria: • Energy • Performance • Complexity (Criteria may or may not reinforce one another) • Investigate a sequence of HTM designs: • Baseline: simple, dedicated Transactional Cache (TC) • Sequence of redesigns to address various limitations: • Transaction size limitations • Abort rates

  6. Virtual platform • Cycle Accurate Simulation • Realistic Power Models (STMicro) Serializer (oveflows, serial resolution) … CORE CORE CORE CORE L1 TC/VC TC/VC L1 L1 TC/VC L1 TC/VC HW locks BUS MASTER BUS MASTER BUS MASTER BUS MASTER BUS PRIV. MEM PRIV. MEM PRIV. MEM PRIV. MEM SHARED MEM …

  7. TC implementation details CORE ARM7 Abort L1 Cache TC Snoop Dev Coherency Signals (MESI) • Configurable Trans. Cache • Vanilla TC (baseline)

  8. Goal: expose parallelism as much as possible Ideal for performance: Large, Highly Associative TC (Eliminates capacitive-miss and conflict-miss evictions) BUT: TCACHE has a big impact on the overall energy consumption (eg. a 512B TC consumes ~30% of the total energy) Design tradeoffs for the TC Energy distribution for matrix-microbenchmark running on 4 cores

  9. tc implementation details CORE ARM7 Abort L1 Cache L1 Cache TC VC VC OFF Snoop Dev Coherency Signals (MESI) • Configurable Trans. Cache • Vanilla TC • Victim cache (VC) • May be dynamically disabled (low synch.)

  10. TC OPTIMIZATIONS Results 40% Improvements Normalized EDP 4core System

  11. Reducing the Abort rate SYNCH. SIGNAL CORE ARM7 COUNTER L1 Cache TC Abort WR SNOOP RW Coherency Signals (MESI) + Bitmaps • Performace affected by Contention Handlers: • Eager (baseline) • Forced-serial • Lazy

  12. Experimental Results ~3x reduction STAMP Avg. 8core System Max Allowed Aborts: >10% improvements Max Allowed Aborts:

  13. CONCLUSIONS TM for embedded systems is different from TM for general-purpose systems Straightforward implementation of Embedded-TM (i.e., TM-vanilla with dedicated FA TC) is too power hungry More complex conflict resolution schemes (i.e. lazy) best for high-conflict workloads.

  14. TechTransfer • Industry Interactions: • 3 industrial liasons: Sourav Roy (Freescale), Maged Michael (IBM), Konrad Lai (Intel) • Freescale has particular interest in HTM on embedded platforms. They have helped us focus on applications best suited for these platforms (e.g., we are currently adapting EEMBC MulitBench benchmarks to run with a Transactional Memory support). • Publications: • C. Ferri, S. Wood, T. Moreshet, R. I. Bahar, and M. Herlihy, “Embedded-TM: Energy and complexity-effective hardware transactional memory for embedded multicore systems,” Elsevier Journal on Parallel and Distributed Computing. Volume 70 , Issue 10 (October 2010). DOI information: 10.1016/j.jpdc.2010.02.003 • C. Ferri, S. Wood, T. Moreshet, R. I. Bahar, M. Herlihy, “Energy and throughput efficient transactional memory for embedded multicore systems,” International Conference on High-Performance Embedded Architectures and Compilers. January 2010.

More Related