1 / 29

Hybrid Transactional Memory

Hybrid Transactional Memory. Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen,. Intel Labs University of Michigan Intel Labs Intel Labs Intel Labs. Promise of Transactional Memory (TM). Maintain consistency in the presence of errors. Easier to program

kaycee
Download Presentation

Hybrid Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs Intel Labs Intel Labs

  2. Promise of Transactional Memory (TM) Maintain consistency in the presence of errors Easier to program Compose naturally Easier to get parallel performance No deadlocks Avoid priority inversion and convoying Supports fault tolerance 6 5 4 3 2 1 lock(l1); lock(l2); A = A – 10; B = B + 10; unlock(l1); unlock(l2); ... if ( error ) recovery_code(); transaction { A = A – 10; B = B + 10; } ... if ( error ) abort_transaction; Simplify Parallel Programming Hybrid Transactional Memory

  3. Flavors of Transactional Memory Easier to program Compose naturally Easier to get parallel performance No deadlocks Maintain consistency in the presence of errors Avoid priority inversion and convoying Supports fault tolerance 6 5 3 1 4 2 Basic Support programmer abort Support nonblocking Our Work: Efficient support for a TM that supports all these features Hybrid Transactional Memory

  4. TM Implementations Requires versioning support and conflict detection • Hardware approach [ Herlihy’93 ] • Bounded number of locations • Maintain versions in cache → Low overhead • Pure-software approach [ Herlihy’03, Harris’03 ] • Unbounded number of locations can be accessed within a transaction • Slow due to overhead of maintaining multiple copies • Potentially orders of magnitude • Unbounded hardware approach [ Hammond’04, Ananian’05, Rajwar’05, Moore’06 ] • Require significant hardware support • Discussed in more detail in the paper Hybrid Transactional Memory

  5. Hardware Approach Low overhead Buffers transactional state in Cache More concurrency Cache-line granularity Bounded resource Assembly Within a module Software Approach High overhead Uses Object copying to keep transactional state Less Concurrency Object granularity No resource limits High-level languages Across modules Hardware vs. Software TM Useful BUT Limited to library writers Useful BUT Limited to special data structures Neither is satisfactory for broader use Hybrid Transactional Memory

  6. This Work A Hybrid Transactional Memory Scheme • Requires modest hardware support • Changes are localized • Supports unbounded number of locations • Performance of hardware when within hardware resource limits ( Low Overhead of pure Hardware TM ) • Gracefully fall back to software if the hardware resource limits are exceeded ( Unbounded resources of Pure software TM ) Experimentally demonstrate effectiveness of our approach Hybrid Transactional Memory

  7. Outline • Motivation • Proposed Architectural Support • Hybrid Transactional Memory • Performance Evaluation • Conclusions

  8. ISA Extensions • Start of a Transaction • Begin Transaction All ( XBA ) or Select ( XBS ) • Save Register State ( SSTATE ) • Specify handler on abort due to conflict ( XHAND ) • During a Transaction • Perform memory loads and store • Override defaults ( LDX, STX, LDR, STR ) • On Transaction Abort • Explicit Abort Transaction ( XA ) • Restore Register State ( RSTATE ) • On Transaction Commit • Commit Transaction ( XC ) Hybrid Transactional Memory

  9. Our proposed changes Modest and Localized Modifications to Core L1 $ No changes to Interconnect Coherence Protocol L2 $ Memory Baseline CMP Architecture Core Core Core L1 $ L1 $ L1 $ Interconnect L2 $ Hybrid Transactional Memory

  10. Three requirements: Maintain two versions Detect conflict Same core: Tag Another core: Cache coherence Atomic commit and abort Bounded Capacity of TM $ Associativity of TM $ and L2 Hardware Support for TM Regular Accesses Transactional Accesses L1 $ Transactional $ New Data Addl. Tag Old Data Data Tag Tag To Interconnect Core L1 $ Hybrid Transactional Memory

  11. Outline • Motivation • Proposed Architectural Support • Hybrid Transactional Memory • Existing pure software scheme • Our hybrid scheme • Performance Evaluation • Conclusions

  12. Pure Software TM [ Herlihy’03 ] State Object Pointer Object Contents State Pointer Old New • We use this Pure Software TM as a starting point • Implemented without any special architectural support using two techniques • Use copies of objects to keep transactional state • Make modifications on the copy during a transaction • Add a level of indirection • Switch the versions on when a transaction is committed Object Contents Hybrid Transactional Memory

  13. Pure Software TM Scheme Cont’d X Modify Valid Copy State State Object Pointer Object Contents Object Contents State Pointer State Pointer Old Old New New Before accessing an object within a transaction Object Contents Hybrid Transactional Memory

  14. Our Hybrid Transactional Memory • Two modes: Hardware and Software mode • The two modes need to coexist • Non-solution: Make all threads transition modes in lockstep • Avoid versioning overheads (allocation and copying) in the hardware mode • Still incur the indirection overheads • Tricky because it needs to bridge the hardware and software schemes • Hardware mode needs to modify data in-place • Pure Software TM assumes data is never modified in-place • Different sharing granularity • Cache-line (Hardware) vs. Object (Software) • Different conflict detection scheme • Data (Hardware) vs. State (Software) Hybrid Transactional Memory

  15. Hybrid Scheme Example X State State Object Pointer Object Contents Object Contents State Pointer State Pointer Old Old New New Conflict detected by the threads in the hardware mode In the Hardware Mode Modify in place Object Contents Thread 1: HW mode Thread 2: HW mode In the Software Mode Copy and Modify Thread 3: SW mode Hybrid Transactional Memory

  16. Hybrid Scheme Summary Object Contents State Object Pointer Object Contents State Pointer Old New Hybrid Transactional Memory

  17. Outline • Motivation • Proposed Architectural Support • Hybrid Transactional Memory • Performance Evaluation • Conclusions

  18. Experimental Framework • Infrastructure • Cycle-accurate execution-driven Multi-core simulator • Modified GCC • Three microbenchmarks • Two scenarios: Low and High Contention • Compare four synchronization implementations • Lock • Pure Hardware Transactional Memory • Pure Software Transactional Memory • Hybrid Transactional Memory Hybrid Transactional Memory

  19. Performance Normalized Execution Time Number of Cores Benchmark: Vector-Reduce Contention: Low Hybrid Transactional Memory

  20. Outline • Motivation • Proposed Architectural Support • Hybrid Transactional Memory • Performance Evaluation • Conclusions

  21. Conclusions • Transactional Memory is a promising approach • Makes parallel programming an easier task • Easier to achieve parallel speedup • Hybrid Transactional Memory approach works • Requires only modest hardware support • Common case: Good performance for most transactions • Uncommon case: Graceful fallback to software mode when a transaction cannot complete within the hardware bounds Hybrid Transactional Memory

  22. Questions ?

  23. Transactions A Synchronization Mechanism to coordinate accesses to shared data by concurrent threads (An alternative to locks) Transaction: A group of operations on shared data Transaction { A = A – 10; B = B + 10; ... if (error) abort_transaction; } An API Enhancement: 1. Abort in middle of a transaction o On encountering a error Hybrid Transactional Memory

  24. Transactional Memory (TM) • A transaction satisfies the following properties • Atomicity: All-or-nothing • On Commit: all operations become visible • On Abort: none of the operations are performed • Isolation (Serializable) • The transactions committed appear to have been performed in some serial order • Additional Properties • Optimistic concurrency control • Necessary for achieving good parallel speedup • Non-blocking (Optional) • Avoid Priority Inversion • Avoid Convoying Hybrid Transactional Memory

  25. Advantage 1: Performance L1 Data Conflict C D L1 L1 A A C B D B L1 L1 A A A L1 Transactions Locks Serialized on Locks Finer granularity locks helps Burden on programmer Optimistically execute concurrently Abort and restart on data conflict Automatically done by runtime Hybrid Transactional Memory

  26. Advantage 2: Reduces Bugs • With locks, programmers need to • Remember mapping between shared data and locks that guard them • Make sure the appropriate locks are held while accessing shared data • Make lock granularity as small as possible • Avoid deadlocks due to locks • All of these can cause subtle bugs • With TM, programmer does not have to deal with these problems Hybrid Transactional Memory

  27. Other Advantages • Allows new programming paradigms • Simplifies error handling • A new style of programming: Speculate and Verify Programmer can abort offending transactions • Avoids other problems that locks suffer from • Priority Inversion: A low-priority thread can grab a lock and block a higher-priority thread • Convoying: If a thread holding a lock blocks on a high-latency event (like context-switch or I/O), it can cause other threads to wait for long periods • Fault Tolerant: If a process holding a lock dies, other processes will hang forever Runtime system can abort offending transactions Hybrid Transactional Memory

  28. Benchmark: Vector-Reduce Contention: Low Normalized Execution Time Number of Cores Hybrid Transactional Memory

  29. ABCDEF Abcdef Ghijk ABCDEF Abcdef Ghijk Abcdef Ghijk Abcdef Ghijk Abcdef Ghijk ABCDEF Abcdef Ghijk Hybrid Transactional Memory

More Related