Dynamic Software Transactional Memory

Dynamic Software Transactional Memory Idan Igra Topics in Reliable Distributed Computing (048961) Technion, Nov 2008

Agenda • Motivation • Software Transactional Memory • Dynamic Software Transactional Memory • Faser’s STM • Dynamic STM vs. Faser’s STM • A blocking STM implementation • Another obstruction free STM implementation by Faser • DSTM Contention management William N. Scherer III Department of Computer Science University of Rochester Rochester, NY 14620, USA scherer@cs.rochester.edu Mark Moir Sun Microsystems Laboratories 1 Network Drive Burlington, MA 01803, USA mark.moir@sun.com Victor Luchangco Sun Microsystems Laboratories 1 Network Drive Burlington, MA 01803, USA victor.luchangco@sun.com Maurice Herlihy Department of Computer Science Brown University Providence, RI 02912, USA mph@cs.brown.edu

Multicore history • Parallel computing was used for HPCs and networking. • PRAM & other shared memory models aren’t realistic. • BSP & LogP (message passing models) were used. • Only for HPC specialists. • Demand complicated system analyze per application. • HW constraints force multicore architectures. • Today’s parallel programming based on locks. • Coarse grained code prevent parallelism, fine grained are hard to use. • Code reuse demands exposing internal locks. • No conventional way to connect mutex and its data.

Nonblocking liveness properties • Wait freedom: Every process which tries to do an operation will complete it in a finite number of steps. • Lock freedom: If any process tries to do an operation, then there is a process which will succeed completing an operation. • Obstruction freedom: Process that runs by its own tries to do an operation will complete it.

Atomic hardware primitives • Load_Linked / Store_Conditional (LL/SC): LL(addr) returns the value pointed by addr. Next call to SC(addr, val) writes val into addr if it was not written since last LL call. • Compare And Swap (CAS): The operation CAS(addr, e, v) swaps the values of addr and v if addr == e. • MCAS: Atomic m CAS operations (particular case: DCAS).

Helping methodology • A methodology for non-blocking algorithms. • Any process which holds a data that other process needs is helped by the other. • Usually recursive help. • Particularly, used widely in Transactional Memory for MCAS software implementation (known as k-RMW).

Software Transactional Memory • First try to catch the whole data it needs. • If succeeded – compute transaction and release the data. • If failed – release all and retry.

Software Transactional Memory Why Software Transactional Memory? • Unexpected delays decreases performances of locking method, besides its inherent programming difficulties. • Memory allocation and deallocation synchronization conflicts. • Hardware Transactional Memory lacks the platform support, portability and delay anomalies. • Methods like translating the code to k-RMW actions is non-trivial. • Working on a copy of the object is not good for large data structure. • Programmable and flexible non-blocking parallel programming method is needed.

Software Transactional Memory Data set pre-acquiring • Unintuitive programming. • Reduces parallelism. • Common data structures should be acquired totally. • Dynamic data structures are impossible.

Software Transactional Memory Hardware support • LL/SC is not commonly supported by hardware. • Operating system can support it. • Much slower. • Reduce parallelism (force some scheduling). • More useful primitive can be defined.

Software Transactional Memory Wait freedom cost: • Complicated acquiring code. • Not flexible. • Non-common primitives. • Long locking time.

Dynamic STM • Enables also dynamic transactions – with a changing data set. • Satisfies Obstruction freedom. • Modular contention manager for progress forcing, priorities and application-adapting.

Dynamic STM

Dynamic STM Implementation principles: • A TM object points to Locator which contains an old version, a new one and the last transaction opened it for writing. • The right version is determined by the status (active / aborted / committed). • All objects are committed at once by changing the status. • Obstruction free is obtained by aborting a conflicting transaction (conditioned by contention manager agreement).

Dynamic STM DSTM properties and results: • Much natural to write and convert sequential code into DSTM code. • Releases can significantly increase performance. • Re-use simpler algorithms for a bigger one is easier using DSTM. • Disadvantage: no way to know that an object was opened for reading.

Dynamic STM • Obstruction free enables: • simplicity, • for some application is good enough, • enables implementation of priorities, • enables separating correctness and progress • and most important – prevent the need of helping mechanism. • However, one can consider it is not a real progress property.

Dynamic STM Discussion DSTM vs. STM: • DSTM relates to STM like Coarse-grained to fine-grained. • But STM meets a real requirement and not weakened one (obstruction free). • Releases as an integral part of the mechanism reduces conflicts (compared to locks). Non-blocking, particularly obstruction free, is better for delayed/failed processes won’t stop the whole system (Very strong for DSTM). • DSTM’s implementation might cause loosing that gain for real parallelized systems. • Let the contention manager do the work is exactly like assuming the scheduler will do that.

Faser’s STM STM should satisfy: • Small fixed storage overhead per object. • Small shared memory operations. • Contention time is short. • Reduces time that transactions meet. Nice to have: • Supporting varying object sizes. • Nesting transactions.

Faser’s STM • Every object is represented as a pointer to object handler, which consists of version number and a pointer to the data block. • Open for read returns the data block pointer. • Open for write returns a pointer to a shadow copy. • Commit is done by acquiring all the opened object, MCAS and helping.

Faser’s STM

Faser’s STM • Problem: Acquiring and releasing read-only object block non-conflicted transactions. • Critical for single start point data structures (head of linked list). • Solution: not to acquire read-only objects. • Add a read-checking state in which the transactions checks all the opened read only objects, so other transactions don’t update it during this time.

Faser’s STM • Deadlock Prevention: T1 can abort T2 only if: • both’ status is read-checking • T2 holds a location that T1 tries to read • T1 < T2 according to a given total order between transactions.

DSTM vs. FSTM FSTM is much better: • Lazy acquire exposes a transaction to others for a very short time, reduces conflict number. • Indirection levels decrease performances (mainly for read-only transactions). • Obstruction freedom’s contention manager has a 5-10% overhead and hard for designing.

DSTM vs. FSTM

DSTM vs. FSTM DSTM is much better: • Eager acquire helps capturing conflicts earlier. • Possible thanks to Obstruction freedom weakness. • Fewer CAS’s (N+1 for DSTM vs. 2N+2 for FSTM). • Implementation is simpler and more efficient. • MCAS causes a lot of cache block trashing.

DSTM vs. FSTM

DSTM vs. FSTM DSTM is better for workloads which: • Opening a lot of locations. • Mainly write accesses for the same location (IntSet). • Transactions must be serialized (stack). FSTM is better for workloads which: • Livelocks are common (RBTree). • Small Transactions • Small conflict probability (IntSetRelease).

DSTM vs. FSTM General remarks: • Not validating repeatedly improves performances. • How can non-consistent (aborted) transactions be avoided?

Contention Management Recall – DSTM contention manager should: • ensure progress. • eventually returns from every call. • eventually aborts conflicting transaction. Management approaches are tested for: • Various data set • Visible/Invisible reads (optimistic/non-optimistic). • Eliminating unnecessary aborts.

Contention Management • Aggressive – always abort enemytransaction. Good baseline to compare. • Polite – backoff before aborting. Sensitive to preemption, page faults… • Randomized – (Balanced) coin if aborting or wait (64ns). • Eruption – a transaction helps its blocking transaction by giving its momentum (Momentum = successful open tries + blocked transactions momentum). • The reasoning is let transactions which hold critical data to finish.

Contention Management • Karma – the older transaction (in terms of opening tries) wins. Also tries on previous aborted runs are accounted. • Kindergarten – First backoff is used beforeaborting. Later the abort is done by turns. • KillBlocked – a transaction will abort its blocking if it is also blocked (or after fixed time). • Timestamp – the older transaction wins. Failure detector is used. • QueueOnBlock – blocked transactionsare released according to a queue whenthe blocking has finished (or after a fixed time).

Contention Management

Contention Management Results: • Most of Managers except TimeStamps, are good for IntSetRelease with Invisible reads. • Aggressive, Randomized, Eruption, Polite perform badly. • QueueOnBlock and KillBlocked has good performance only for RBTree with Invisible reads. • TimeStamps is good only for Counter. • KinderGarten is excellent, except for IntSetRelease with Visible reads and for RBTree. • Karma is not good for IntSet and for LFUCache with visible reads.

Contention Management

Contention Management Visible reads vs. Invisible reads: • In IntSet and Counter there is no difference as all the accesses are for writing. • In IntSetRelease visible reads are better (except for Kindergarten which is bad for both). • Visible reads let an option to avoid conflicts on short time accesses. • In LFUCache for all managers, and RBTree for all but Karma, Invisible reads is much better. • Most of conflicts are between a reader which scans its path and writer which updates the path to the root.

Blocking STM implementation Why not be annoyed about blocking (mainly compared to obstruction free)? • Long transactions must be aborted. Obstruction free is forced only for a single transaction. • Context switch is not a problem • Temporary. • OS automatic adaption. • Platform support (by priorities, etc.). • Independent failure • Not common in multicore. • Sequential programs also fail due to a single failure.

Blocking STM implementation Non-blocking is bad because: • Metadata and the object must be stored separately in order to satisfy non-blocking. • Doubling the cache misses. • Assume N active transactions on N processors: A new transaction mustn’t be blocked, the conflict number increases.

Blocking STM implementation

Blocking STM implementation • Every transaction has in its private data descriptor per opened object (consists of the version, pointer and (maybe) a copy). • Every object has a lock (with deadlock prevention) which is used when trying to commit. • Accesses wait for the object to be unlocked. Read accesses are optimistic. • Priority mechanism.

Blocking STM implementation CPU time for various processor number:

Blocking STM implementation CPU time for various contention instances:

Blocking STM implementation Discussion: • Context switch IS a problem because of long delays. • Failure are more common on parallel programs than on sequential ones. • Delay is more interesting than throughput?

Another STM • Similarly to DSTM, Committing is done by changing a state and current version is determined by owner transaction state. • But like FSTM, before committing the transaction tries to acquire all of its owned records. • Wait method is provided in order to wait an acquired data before retrying.

Another STM • An Ownership-record (orec) contains either the version number of one (or more) objects or a pointer to the owner transaction descriptor. • Before committing, any transaction tries to acquire its owned data. • In case of already acquired data, the transaction can abort the other transaction, wait for it to finish or awake it (if it sleeps).

References • Robert Ennals (Jan 2006). Software Transactional Memory Should Not Be Obstruction-Free. Technical Report Nr. IRC-TR-06-052. Intel Research Cambridge Tech Report. • K. Fraser. Practical Lock-Freedom. Technical Report UCAM-CL-TR-579, Cambridge University Computer Laboratory, February 2004. • Tim Harris , Keir Fraser. Language support for lightweight transactions. Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, October 26-30, 2003, Anaheim, California, USA. • Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer III. Software Transactional Memory for Dynamic-Sized Data Structures.ACM Symposium on Principles of Distributed Computing (PODC): 92-101, 2003. • Maurice Herlihy , Victor Luchangco. Distributed computing and the multicore revolution. ACM SIGACT News, v.39 n.1, March 2008. • Virendra J. Marathe and William N. Scherer III and Michael L. Scott (Oct 2004). Design Tradeoffs in Modern Software Transactional Memory Systems. In: Proceedings of the 7th Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers. Houston, TX. • N. Shavit and D. Touitou. Software transactional memory. Distributed Computing, Special Issue(10): 99-116, 1997. • William N. Scherer III and Michael L. Scott (Jul 2004). Contention Management in Dynamic Software Transactional Memory. In: Proceedings of the ACM PODC Workshop on Concurrency and Synchronization in Java Programs. St. John's, NL, Canada. In conjunction with PODC'04.

More reading Ennals’ blocking STM: • Robert Ennals. Efficient Software Transactional Memory. Intel Research Cambridge Technical Report: IRC-TR-05-051, 2005. PRAM: • S. Fortune and J. Wyllie. Parallelism in Random Access Machines. In Proceedings of the 10th Annual Symposium on Theory of Computing, pages 114-118, 1978. • Phillip B. Gibbons , Yossi Matias , Vijaya Ramachandran. Can shared-memory model serve as a bridging model for parallel computation?. Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.72-83, June 23-25, 1997, Newport, Rhode Island, United States. • P. B. Gibbons. A more practical PRAM model. Proceedings of the first annual ACM symposium on Parallel algorithms and architectures, p.158-168, June 18-21, 1989, Santa Fe, New Mexico, United States. Popular message-passing old models: • David Culler , Richard Karp , David Patterson , Abhijit Sahay , Klaus Erik Schauser , Eunice Santos , Ramesh Subramonian , Thorsten von Eicken. LogP: towards a realistic model of parallel computation. ACM SIGPLAN Notices, v.28 n.7, p.1-12, July 1993. • Leslie G. Valiant. A bridging model for parallel computation. Communications of the ACM, v.33 n.8, p.103-111, Aug. 1990. Memory allocation in multi-core: • Andrei Gorine, Konstantin Knizhnik. Tackling memory allocation in multicore and multithreaded applications. MCObject LLC, May 29 2006. Available on the internet from http://www.embedded.com/columns/showArticle.jhtml?articleID=188101359 • Voon-Yee Vee , Wen-Jing Hsu. A Scalable and Efficient Storage Allocator on Shared Memory Multiprocessors. Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN '99), p.230, June 23-25, 1999. • P.R. Wilson, M.S. Johnstone, M. Neely, and D. Boles. Dynamic storage allocation: A survey and critical review. In H.G. Baker, editor, Proceedings of International Workshop on Memory Management (IWMM'95), volume 986 of Lecture Notes in Computer Science, pages 1-116, Kirnoss, Scotland, Sept. 1995.

Dynamic Software Transactional Memory

Dynamic Software Transactional Memory

Presentation Transcript

Transactional memory

Software Transactional Memory

Software Transactional Memory

A Dynamic Binary-Rewriting Approach to Software Transactional Memory

Adaptive Software Transactional Memory

Software Transactional Memory

Software Transactional Memory for Dynamic-sized Data Structures

Software Transactional Memory

Software Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

A Dynamic Binary-Rewriting Approach to Software Transactional Memory

Algorithmics for Software Transactional Memory

Software Transactional Memory

Software Transactional Memory for Dynamic-Sized Data Structures

Software Transactional Memory

Software Transactional Memory for Dynamic-Sized Data Structures (DSTM – Dynamic STM)

Transactional Memory

Transactional Memory

Software Transactional Memory

Software Perspectives on Transactional Memory