1 / 29

Transactional Memory

Transactional Memory. James Larus and Christos Kozyrakis. MOTIVATION. Transition from sequential computing to parallel computing Achieving optimal performance from Multicore computers based on improving parallelism in programming.

Download Presentation

Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TransactionalMemory James Larusand Christos Kozyrakis

  2. MOTIVATION • Transition from sequential computing to parallel computing • Achieving optimal performance from Multicorecomputers based on improving parallelism in programming. • Find better abstractions for expressing parallel computation and for writing parallel programs • Current Programming Constructs. • Threads, Locks, Semaphores etc

  3. TRANSACTIONAL MEMORY • A transaction is a form of program execution. • In case of parallel programming, TM offers a mechanism that allows portions of a program to execute in isolation, without regard to other, concurrently executing tasks. • TM provides lightweight transactions for threads running in a shared address space. • TM ensures the atomicity and isolation of concurrently executing tasks. • TM provides a basis to built parallel abstractions

  4. TRANSACTIONAL MEMORY • Atomicity • Atomicity ensures program state changes effected by code executing in a transaction are indivisible from the perspective of other, concurrently executing. • Isolation • Isolation ensures that concurrently executing tasks cannot affect the result of a transaction, so a transaction produces the same answer as when no other task was executing.

  5. PROGRAM MODEL • General TM Systems • Provide simple atomic statements that execute a block of code (and the routines it invokes) as a transaction. • Not a replacement for general synchronization such as semaphores or condition variables. • AME • Executing most of a program in transactions • Supports asynchronous programming

  6. ADVANTAGES • TM offers a simpler alternative to mutual exclusion by shifting the burden of correct synchronization from a programmer to the TM system. • Program’s author only needs to identify a sequence of operations on shared data that should appear to execute atomically to other, concurrent thread. • Transactions make synchronization composable, which enables the construction of concurrency programming abstractions.

  7. LIMITATIONS • Transactions by themselves cannot replace all synchronization in a parallel program • Synchronization is often used to coordinate independent tasks • Consider, a producer-consumer programming relationship. • Transactions can ensure the tasks’ shared accesses do not interfere • If the consumer transaction finds the value is not available, it can only abort and check for the value later. • TM systems provide a guard that prevents a transaction from starting execution until a predicate becomes true. • Retry and orElseconstructs by Haskell TM • The trade-offs and programming pragmatics of the TM programming model are still not understood. • The performance of TM is not yet good enough for widespread use. • Software TM systems (STM) impose considerable overhead costs on code running in a transaction • HTM fall back on software for large transactions

  8. TRANSACTIONAL MEMORY IMPLEMENTATION • STM (Software Transactional Memory • HTM (Hardware Transactional Memory) • Most TM systems of both types implement optimistic concurrency control. • The alternative pessimistic concurrency control requires a transaction to establish exclusive access to a location.

  9. STM • STM • Implemented lock-free, atomic, multi-location operations entirely in software • Required a program to declare in advance the memory locations to be accessed by a transaction

  10. STM • DSTM • Object-granularity, deferred-update STM system • Conflict Detection • Early Detection • Late Detection • Read- Write Conflicts • Only clone objects that are modified. • Read-Object List • Conditions for Commit • No concurrently executing transaction modified an object read by T • Transaction T is not modifying an object that another transaction is also modifying. • Performance of DSTM dependent on workload

  11. STM • Deferred Update Systems • WSTM system detects conflicts at word, not object, granularity Direct update Systems • Avoid unnecessary conflicts if two transactions access different fields in an object • Extended Java with an atomic statement that executed its block in a transaction • Policy to select which transaction to abort in case of conflict. • “Polka Policy” – Track no. of objects it has open and uses them as priority.

  12. STM • Direct Update Systems • Transactions directly modify an object, rather than a copy. • Must record the original value of each modified memory location. • Must prevent a transaction from reading the locations modified by other, uncommitted transactions, thereby reducing the potential for concurrent execution • Require a lock to prevent multiple transactions from updating an object concurrently. • Direct-update STM systems provide forward progress guarantees to an application by detecting and aborting failed or blocked threads.

  13. HTM • Hardware Acceleration for STM • The primary source of overhead for an STM is the maintenance and validation of read sets • Invokes instrumentation routine • HASTM first proposed by Saha et al. • Provides the STM with two capabilities through per-thread mark bits at the granularity of cache blocks • Software can check if a mark bit was previously set for a given block of memory and that no other thread wrote to the block since it was marked. • Software can query if potentially there were writes by other threads to any of the memory blocks that the thread marked.

  14. HTM • HASTM • Implements mark bits using additional metadata for each block in the per-processor cache of a Multicorechip • The read instrumentation call checks and sets the mark bit for the memory block that contains an object’s header • If the mark bit was set, indicating that the transaction previously accessed the object, it is not added to the read set again • Validation • Relies on software based validation if checked. • In HASTM, the mark bits may be lost if a processor is used to run other tasks

  15. HTM • SigTM • Uses hardware signatures to encode the read set and write set for software transactions • A hardware Bloom filter outside of the caches computes the signatures • Software instrumentation provides the filters with the addresses of the object • Hardware in the computer monitors coherence traffic for requests for exclusive accesses to a cache block, which indicates a memory update • The hardware tests if the address in a request is potentially in a transaction’s read or write set by examining the transaction’s signatures. • Either aborts or falls back on SW validation. • Capacity and conflict misses do not cause software validation • May produce false conflicts due to address aliasing in a Bloom filter • SigTMsignatures track physical addresses

  16. HTM • HTM systems require no software instrumentation of memory references within transaction code. • Manages data versions and tracks conflicts transparently as software performs ordinary read and write accesses • Rely on a computer’s cache hierarchy and the cache coherence protocol to implement versioning and conflict detection

  17. HTM • Transactional Coherence and Consistency (TCC) • Deferred update HTM that performs conflict detection when a transaction attempts to commit. • Each cache block is annotated with R and W tracking bits • Cache blocks in the write set act as a write buffer and do not propagate the memory updates until the transaction commits. • Two-phase protocol.

  18. HTM • Hardware acquires exclusive access to all cache blocks in the write set using coherence messages • The hardware instantaneously resets all W bits in the cache, which atomically commits the updates by this transaction • If validation fails, hardware reverts to a software handler • Conflict Detection

  19. HTM • Advantages & Limitations • An HTM system can outperform a lockbased STM by a factor of four and the corresponding hardware-accelerated STM by a factor of two • The caches used to track the read set, write set, and data versions have finite capacity and may overflow on a long transaction • The transactional state in caches is large and is difficult to save and restore • Placing implementation-dependent limits on transaction sizes is unacceptable from a programmer’s perspective.

  20. SOLUTIONS • Offending transaction executes to completion • HTM system can update memory directly without tracking the read set, write set, or old data . • However, no other transactions can execute • Virtualized TM (VTM) • Maps the key bookkeeping data structures for transactional execution (read set, write set, write buffer or undo- log) to virtual memory • Hardware caches hold the working set of these data structures • Hybrid HTM–STM system • transaction starts in the HTM mode • restarted in the STM mode with additional instrumentation if resources exceeded • Provides good performance for short transactions.

  21. HARDWARE/SOFTWARE INTERFACE FORTRANSACTIONAL MEMORY • Four interface mechanisms for HTM systems • The first mechanism is a two-phase commit protocol that architecturally separates transaction validation from committing its updates to memory • The second mechanism is transactional handlers that allow software to interface on significant events • The third mechanism is support for closed and open-nested transactions • Fourth, multiple types of load and store instructions what allow compilers to distinguish accesses to thread private, immutable, or idempotent data from accesses to truly shared data

  22. Open Issues • Transaction that executed an I/O operation may roll back at a conflict. • Strong and weak atomicity. • STM systems generally implement weak atomicity, in which non-transactional code is not isolated from code in transactions • HTM systems, on the other hand, implement strong atomicity • TM must coexist and interoperate with existing programs and libraries

  23. CONCLUSION • TM provide a time tested model for isolating concurrent computations from each other • Raises the level of abstraction for reasoning about concurrent tasks

More Related