1 / 41

Hardware Transactional Memory

Hardware Transactional Memory. Shimin Chen (LBA Reading Group). Outline. Transaction Concept A simple HTM Common Case Transaction Behaviors HTM Research Directions Description of Papers Summary. Transaction. A finite sequence of instructions Atomicity: all or nothing

Download Presentation

Hardware Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware Transactional Memory Shimin Chen (LBA Reading Group)

  2. Outline • Transaction Concept • A simple HTM • Common Case Transaction Behaviors • HTM Research Directions • Description of Papers • Summary

  3. Transaction • A finite sequence of instructions • Atomicity: all or nothing • Serializability (Isolation): steps of one transaction never appear to be interleaved with the steps of another. • A and B cannot be concurrent if • ReadSet(A)  WriteSet(B)  , or • WriteSet(A)  ReadSet(B)  , or • WriteSet(A)  WriteSet(B)  

  4. A simple HTM New hardware mechanisms to • checkpoint register state • Checkpoint register renaming table • buffer transactional writes • in private cache • record transactional read-set and write-set • R bit and W bit per cache line • Or dedicated state buffer on the side • detect conflict • leverage cache coherence protocol • resolve conflict • e.g. requester wins

  5. Simple HTM Operations • TxBegin • Checkpoint register state • Load/Store • Set state bits in cache; abort upon cache eviction • Incoming coherence message • Check conflicts with state bits; abort if conflicted • TxCommit • Flash clear state bits • Abort • Flash invalidate write sets and read sets • Restore register checkpoint

  6. Outline • Transaction Concept • A simple HTM • Common Case Transaction Behaviors • HTM Research Directions • Description of Papers • Summary

  7. “The Common Case Transactional Memory Behavior of Multithreaded Programs”. Stanford Team (Kozyrakis, Olukotun, and their students: Chung, Chafi, Minh, McDonald, Carlstrom). HPCA 2006. • Studied 35 applications • Java, C+Pthread, C+OpenMP, Parallel Processing Macros • Assume high level parallelism structure remains the same: convert lock/unlock into begin/end etc. • Trace-based analysis

  8. Non-blocking synchronization

  9. ReadSet and WriteSet Size • For 95% of transactions, RS < 4KB, WS<1KB • Weighted by time: 52KB RS, 30KB WS needed for covering 80% time • (assuming 32B cache lines)

  10. Nesting • Nesting distance could be high • Partial rollback may be needed • Two-level of nests are common

  11. Speculative Parallelization

  12. Outline • Transaction Concept • A simple HTM • Common Case Transaction Behaviors • HTM Research Directions • Description of Papers • Summary

  13. Directions • Dealing with overflows • Virtualizing HTM • Mixing HTM with STM • Two code paths • Use hardware mechanisms to speed up STM

  14. Terminology • Conflict Detection • Eager: at coherence message • Lazy: at commit time • Version Management • Eager: save old version, update in place • Lazy: buffer updates • Conflict Resolution

  15. Outline • Transaction Concept • A simple HTM • Common Case Transaction Behaviors • HTM Research Directions • Description of Papers • Summary

  16. “Transactional Memory: Architectural Support for Lock-Free Data Structures.” Herlihy (DEC) & Moss (UMass). ISCA 1993. • “Multiple Reservations and the Oklahoma Update.” Stone, Stone, Heidelberger, Turek (IBM). IEEE Parallel & Distributed Technology. 1993. • “Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution.” Rajwar & Goodman. (Wisconsin). ISCA 2001. • “Transactional Lock-Free Executionof Lock-Based Programs.” Rajwar & Goodman. (Wisonsin). ASPLOS 2002. • “Transactional Memory Coherence and Consistency.” Stanford team. ISCA 2004. • “Unbounded Transactional Memory.” Ananian, Asanovic, Kuszmaul, Leiserson, Lie (MIT). HPCA 2005. • “Virtualizing Transactional Memory.” Rajwar, Herlihy, Lai. (Intel & Brown). ISCA 2005. • “LogTM: Log-based Transactional Memory.” Moore, Bobba, Moravan, Hill, Wood. (Wisconsin team). HPCA 2006. • “Hybrid Transactional Memory.” Kumar, Chu, Hughes, Kundu, Nguyen. PPoPP 2006. • “Architectural Semantics for Practical Transactional Memory.” Stanford team. ISCA 2006.

  17. “Bulk Disambiguation of Speculative Threads in Multiprocessors.” Ceze, Tuck, Cascaval, Torrellas. (UIUC). ISCA 2006. • “Supporting Nested Transactional Memory in LogTM.” Wisconsin team. ASPLOS 2006. • “Unbounded Page-Based Transactional Memory.” Chuang, Narayanasamy, Venkatesh, Sampson, Biesbrouck, Pokam, Colavin, Calder. (UCSD, ST Microelectronics, Microsoft). ASPLOS 2006. • “Tradeoffs in Transactional Memory Virtualization.” Stanford team. ASPLOS 2006. • “Hybrid Transactional Memory.” Damron, Fedorova, Lev, Luchangco, Moir, Nussbaum. (Sun). ASPLOS 2006. • “Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory.” Blundell, Devietti, Lewis, Martin. (UPenn, VMware). ISCA 2007. • “An Effective Hybrid Transactional Memory System with Strong Isolation guarantees.” Stanford team. ISCA 2007. • “An Integrated Hardware-Software Approach to Flexible Transactional Memory.” Shriraman, Spear, Hossain, Marathe, Dwarkadas, Scott. (U Rochester). ISCA 2007. • “Performance Pathologies in Hardware Transactional Memory.” Wisconsin team. ISCA 2007.

  18. Non-overflowed HTM

  19. “Transactional Memory: Architectural Support for Lock-Free Data Structures.” Herlihy (DEC) & Moss (UMass). ISCA 1993. • First HTM paper • Simple HTM like • Transactional cache along L1D • Abort, roll-back: not fully automatic • HW discards transactional updates • SW jumps back and retries transaction (w/ exp backoffs) • Conflict detection: eager (coherence) • Conflict resolution: requester aborts

  20. “Multiple Reservations and the Oklahoma Update.” Stone, Stone, Heidelberger, Turek (IBM). IEEE Parallel & Distributed Technology. 1993. • Single reservation: LL-SC • Multiple reservations: all or nothing, transactions w/ read-modified-writes • Oklahoma update (In a musical “Oklahoma!”, there is a song titled “All er Nothin”) • Simple HTM like • Batch updates and detection at commit time

  21. “Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution.” Rajwar & Goodman. (Wisconsin). ISCA 2001. (SLE) • Idea: • speculate lock-unlock critical section while eliding locks using simple HTM • fall back to locking upon conflicts & overflows • Novelty: recognizing lock and unlock • Lock: LL-SC with predictors • Unlock: a store to restore value changed by LL-SC

  22. “Transactional Lock-Free Executionof Lock-Based Programs.” Rajwar & Goodman. (Wisonsin). ASPLOS 2002. (TLR) • SLE + resolve conflicts • Timestamp<# of commited TLR on the local cpu, cpu ID> • Stall or Abort the younger transaction upon conflicts • Non-trivial addition to cache coherence protocol for avoiding deadlocks

  23. “Transactional Memory Coherence and Consistency.” Stanford team. ISCA 2004. (TCC) • Conflict detection: lazy • Novelty: propose to use transactional memory to replace cache coherence • Illusion of shared memory • Batch communication like message passing

  24. “Bulk Disambiguation of Speculative Threads in Multiprocessors.” Ceze, Tuck, Cascaval, Torrellas. (UIUC). ISCA 2006. • Conflict Detection: lazy • Use bloom filter signature to do batch detection • 2000 bit bloom filter, avg 70 read lines and 20 write lines per transaction

  25. Virtualizing HTM

  26. How? • Generally: save transaction states in virtual memory • Read set, write set • Or readers, writers per block in memory • Conflict detection needs to check this structure • Question: how to make it efficient?

  27. “Unbounded Transactional Memory.” Ananian, Asanovic, Kuszmaul, Leiserson, Lie (MIT). HPCA 2005. • First paper on overflowed transactions • UTM (“Unbounded TM”): • Idealized (very complicated) • LTM (“Large TM”): • Lazy versioning • Limitations: less than a time slice, no migration, smaller than physical memory

  28. “Virtualizing Transactional Memory.” Rajwar, Herlihy, Lai. (Intel & Brown). ISCA 2005. (VTM) • A fairly complete description • Novelty: • XSW: transaction status word • load/store entries point to XSW; • can change transaction state with a single atomic update • Filter for conflict detection • Lazy versioning (buffer updates) • Eager conflict detection

  29. “LogTM: Log-based Transactional Memory.” Moore, Bobba, Moravan, Hill, Wood. (Wisconsin team). HPCA 2006. • Overflow handling • Eager versioning: per-thread undo log • Update in place, save old values in log • Favors commits • Eager conflict detection • Cache has a single overflow bit • Use directory to remember the transactional access to a line even if the line is evicted from cache

  30. “Architectural Semantics for Practical Transactional Memory.” Stanford team. ISCA 2006. • Provide support to call software callbacks • Commit, abort, violation • Nested transactions • Flatterning: a violation rolls back to the beginning of the top-most transaction • Closed nesting: allow partial roll-backs • Open nesting: allow partial commits

  31. “Supporting Nested Transactional Memory in LogTM.” Wisconsin team. ASPLOS 2006. • Undo log is organized as transaction log frames • (just like stack frames) LIFO

  32. “Unbounded Page-Based Transactional Memory.” Chuang, Narayanasamy, Venkatesh, Sampson, Biesbrouck, Pokam, Colavin, Calder. (UCSD, ST Microelectronics, Microsoft). ASPLOS 2006. • Shadow page + home page • Conflict detection: special cache for overflow info before traversing memory structure

  33. “Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory.” Blundell, Devietti, Lewis, Martin. (UPenn, VMware). ISCA 2007. • Making the fast case common: • Permission-only cache • Cache RW bits for overflowed cache lines • Making the uncommon case simple: • Allow only a single overflowed transaction • OneTM-serialized: stall all other Xactions • OneTM-concurrent: allow other non-overflowed xactions • Each block in memory requires a RW bits + transaction ID

  34. “Performance Pathologies in Hardware Transactional Memory.” Wisconsin team. ISCA 2007. • Seven pathological scenarios that different HTMs may do poorly • Livelock cases, starvation, convoy, futile stalling for a xaction that eventually aborts • Enhances: • Conflict resolution: back-offs, priorities • Predicting writes in a transactions, so that one can get ownership at reads

  35. Combining HTM and STM

  36. “Hybrid Transactional Memory.” Kumar, Chu, Hughes, Kundu, Nguyen. PPoPP 2006. • Enhance the Dynamic STM (Herlihy et al: wrap objects with indirection/replication) • HTM mode • STM mode • Tries HTM first • A trick for conflict detection between HTM and STM: • STM also starts a hardware xaction • But only access a single state word transactionally • Perform all other actions nontransactionally

  37. “Tradeoffs in Transactional Memory Virtualization.” Stanford team. ASPLOS 2006. (XTM) • Two modes: all in hardware, all in software • If HTM overflows, aborts it and runs it in software mode • Software mode: • Per-transaction page table • Copy-on-firstaccess: check if read data is not changed at commit • Copy-on-write: buffer transactional writes

  38. “Hybrid Transactional Memory.” Damron, Fedorova, Lev, Luchangco, Moir, Nussbaum. (Sun). ASPLOS 2006. • Compiler generates two code paths, choose at runtime: • STM • HTM • Word-based • Metadata access per memory operation required even for HTM (to detect conflict with STM)

  39. “An Effective Hybrid Transactional Memory System with Strong Isolation guarantees.” Stanford team. ISCA 2007. • SigTM: • Enhance a STM system with hardware signatures

  40. “An Integrated Hardware-Software Approach to Flexible Transactional Memory.” Shriraman, Spear, Hossain, Marathe, Dwarkadas, Scott. (U Rochester). ISCA 2007. (RTM) • Two hardware mechanisms to improve a STM (RSTM) performance: • Alert-on-update: allow software callbacks for invalidation and eviction of selected cache lines • Programmable data isolation: control cache to hold transactional blocks

  41. Summary • Simple HTM is nice • Major complexity comes in because of space and time limitations • Logs, shadow pages, filters, caches, etc. • Combine HTM and STM

More Related