1 / 63

Log-based Transactional Memory

Log-based Transactional Memory. Mark D. Hill Multifacet Project, Univ. of Wisconsin—Madison April 13, 2007 @ Michigan (Go Blue!). Multicore here: “Intel has 10 projects in the works that contain four or more computing cores per chip” —Intel CEO, Fall ’05

bailey
Download Presentation

Log-based Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Log-based Transactional Memory Mark D. Hill Multifacet Project, Univ. of Wisconsin—Madison April 13, 2007 @ Michigan (Go Blue!) • Multicore here: “Intel has 10 projects in the works that contain four or more computing cores per chip” —Intel CEO, Fall ’05 • How program? “Blocking on a mutex is a surprisingly delicate dance” —OpenSolaris, mutex.c

  2. LogTM Contributors • Faculty • Mark Hill, Ben Liblit, Mike Swift, David Wood • Students • Jayaram Bobba, Derek Hower, Kevin Moore,Haris Volos, Luke Yen • Alumna • Michelle Moravan • Funding • Grants from U.S. National Science Foundation • Donations from Intel and Sun Wisconsin Multifacet Project

  3. Summary • Our Transactional Memory (TM) goals • Unlimited TM model: even large/long transactions • Facilitate SW composition: unlimited nesting • Accelerate with some HW support • Log-based TM (Signature Edition) • Supports unlimited TM w/ nesting • Accelerates commit by writing new values in place(after saving old values in a per-thread log) • Signatures summarize read/write sets • HW mechanisms: simple, policy-free, SW accessible Wisconsin Multifacet Project

  4. Outline • TM Motivation & Background • Why TM?, Terminlogy, & Taxonomy • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions Wisconsin Multifacet Project

  5. Thread 0 move(a, b, key1); Thread 1 move(b, a, key2); Locks are Hard // WITH LOCKS void move(T s, T d, Obj key){ LOCK(s); LOCK(d); tmp = s.remove(key); d.insert(key, tmp); UNLOCK(d); UNLOCK(s); } Moreover Coarse-grain locking limits concurrency Fine-grain locking difficult DEADLOCK! Wisconsin Multifacet Project

  6. Transactional Memory (TM) void move(T s, T d, Obj key){ atomic { tmp = s.remove(key); d.insert(key, tmp); } } • Programmer says • “I want this atomic” • TM system • “Makes it so” • Software TM (STM) Implementations • Currently slower than locks • Always slower than hardware? • Hardware TM (HTM) Implementations • Leverage cache coherence & speculation • Fast • But hardware finite & should be policy-free Wisconsin Multifacet Project

  7. Some Transaction Terminology Transaction: State transformation that is: Atomic (all or nothing) Consistent Isolated (serializable) Durable (permanent) Commit: Transaction successfully completes Abort: Transaction fails & must restore initial state Read (Write) Set: Items read (written) by a transaction Conflict: Two concurrent transactions conflict if either’s write set overlaps with the other’s read or write set Wisconsin Multifacet Project

  8. Modules expose interfaces, NOT implementations Example Insert() calls getID() from within a transaction The getID() transaction is nested inside the insert() transaction int getID() { // child TX begin_transaction(); id = global_id++; commit_transaction(); return id; } Nested Transactions for Software Composition void insert(object o){ // parent TX begin_transaction(); t.insert(getID(), o); commit_transaction(); } Wisconsin Multifacet Project

  9. Closed Nesting • On Commit child transaction is merged with its parent • Flat • Nested transactions “flattened” into a single transaction • Only outermost begins/commits are meaningful • Any conflict aborts to outermost transaction • Partial rollback • Child transaction can be aborted independently • Can avoid costly re-execution of parent transaction Child transactions remain isolated until parent commits Wisconsin Multifacet Project

  10. Implementing TM • Version Management • new values for commit • old values for abort • Must keep both • Conflict Detection • Find read-write, write-read or write-write conflictsamong concurrent transactions • Allows multiple readers OR one writer Large state (must be precise) Checked often (must be fast) Wisconsin Multifacet Project

  11. How Do Hardware TM Systems Differ? Conflict Detection Lazy: checkon commit Eager: checkbefore read/write Like Databases withOptimistic Conc. Ctrl. No HTMs (yet) Stanford TCC Illinois Bulk Like Databases withConservative C. Ctrl. Herlihy/Moss TM MIT LTM Intel/Brown VTM MIT UTM Wisconsin LogTM Wisconsin Multifacet Project

  12. Transactional Memory Goals/Challenges • Unlimited TM Model • Large transactions: cache victimization & even paging • Long transactions: thread switching/mitgration • OS traps/calls? • Facilitate SW composition • Unlimited closed nesting (open nesting?) • Accelerate with at most modest HW support • Make the common case fast • Make HW simple, policy-free, & SW exposed Wisconsin Multifacet Project

  13. Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions Wisconsin Multifacet Project

  14. Single-CMP/Multicore System Core0 Core2 Core13 Core14 Core15 … L1 $ L1$ L1$ L1$ L1$ Interconnect L2 $ DRAM Wisconsin Multifacet Project

  15. LogTM Per-Core Hardware Registers Register Checkpoint Conflict Detection:  Signatures Version Mgmt:Pointers toSegmented Log  Read TMCount Write LogFrame SummaryRead LogPtr SummaryWrite Processor (SMT Context) Tag Data No ExplicitTM State  Data Caches Wisconsin Multifacet Project

  16. Outline • Motivation & Background • LogTM Hardware Preview • LogTM Version Management • Basic Logging & Segmented Logs for Nesting • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions Wisconsin Multifacet Project

  17. 1 1 1 1 34------------ -- ------------ --23 LogTM’s Eager Version Management • New values stored in place • Old values stored in transaction log • Allocated per-thread in virtual memory(like per-thread stacks) • Filled by hardware(during transactions) • Read by software (on abort) VA Memory Block R W Sets 00 12-------------- 0 0 40 --------------24 --------------23 0 0 C0 56-------------- 34-------------- 0 0 1000 c0 Log Base 1000 Transaction Log 1040 40 Log Ptr 1090 1080 TM count 1 Wisconsin Multifacet Project

  18. Segmented Transaction Log for Nesting • LogTM’s log is a stack of frames (like activation records) • A frame contains: • Header (including saved registers and pointer to parent’s frame) • Undo records (block address, old value pairs) • Garbage headers (headers of committed closed transactions) • Commit action records • Compensating action records Header LogFrame Undo record LogPtr Undo record 2 0 TM count Header 1 Undo record Undo record Wisconsin Multifacet Project

  19. Closed Nested Commit • Merge child’s log frame with parent’s • Mark child’s header as “dummy header” • Copy pointer from child’s header to LogFrame Header LogFrame Undo record LogPtr Undo record TM count Header 2 1 Undo record Undo record Wisconsin Multifacet Project

  20. LogTM Version Management Discussion • Eager Version Management via Segment Log • Advantages: • Transaction read new values normally (w/o bypassing) • No data movement at commit • Both old & new data in (virtual) memory • Both old & new data can be cached or victimized • Supports unbounded nesting • No extra indirection (unlike STM) • Disadvantages • Aborts slower & handled by software • Adds HW to write log • Requires eager conflict detection? Wisconsin Multifacet Project

  21. Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • Signatures, Nesting, & Detection via Coherence • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions Wisconsin Multifacet Project

  22. LogTM-SE Read/Write Set Summary • Use Per-Thread Signatures (adapted from Bulk) • (Original LogTM used in-cache read/write bits) External ST E External ST F A C D B Program: xbegin LD A ST B LD C LD D ST C … FALSE POSITIVE: CONFLICT! ALIAS Hash Function(s) NO CONFLICT 00100100 00000100 00100100 00000000 00100100 00100100 R W 00100010 00000000 00100010 00000010 00100010 Wisconsin Multifacet Project

  23. Conflict Detection for Unbounded Nesting Nesting Affects Signatures (not coherence next) Nested Begin: Save R/W Signatures on Log Partial Abort: Restore R/W Signatures (Closed) Nested Commit: Discard Saved Signatures Open Nesting also handled Recall LogTM’s Segmented Log already supportsversion management for unbounded nesting Add saved signature space to frame header <Skip Nested Signature Example> Wisconsin Multifacet Project

  24. Nested Begin Transaction Log Program Processor State xbegin LD … ST … xbegin 01001000 01001000 00000000 R 01010010 00000000 Xact header 01010010 W Undo entry Undo entry 1 TMCount Undo entry Log Frame Xact header Log Ptr Wisconsin Multifacet Project

  25. Nested Begin Transaction Log Program Processor State xbegin LD … ST … xbegin 01001000 R 01010010 Xact header W Undo entry Undo entry 2 TMCount Undo entry Log Frame Xact header 01001000 01010010 Log Ptr Wisconsin Multifacet Project

  26. Partial Abort Transaction Log Program Processor State xbegin LD … ST … xbegin LD … ST … ABORT! 01001001 01001000 R 01010010 01110110 Xact header W Undo entry Undo entry 1 2 TMCount Undo entry Log Frame Xact header 01001000 01010010 Log Ptr Undo entry Undo entry Wisconsin Multifacet Project

  27. Nested Commit Transaction Log Program Processor State xbegin LD … ST … xbegin LD … ST … xend 01001000 01001001 R 01010010 Xact header 01110110 W Undo entry Undo entry 1 2 TMCount Undo entry Log Frame Xact header 01001000 Garbage Hdr 01010010 Log Ptr Undo entry Undo entry Wisconsin Multifacet Project

  28. Unbounded Nesting Support Summary Closed nesting: Begin: save signatures Abort: restore signatures Commit: No signature action Open nesting: Begin: save signatures Abort: restore signatures Commit: restore signatures Wisconsin Multifacet Project

  29. LogTM’s Eager Conflict Detection (before access) LogTM detects conflicts using coherence • Requesting core issues coherence request • L2 directory forwards to other core(s) • Responding core • Detects conflict using local signatures • Informs requesting processor of conflict (4) Requesting core resolves conflict Wisconsin Multifacet Project

  30. GETX DATA Protocol Animation: Transactional Write • Core C0 store • C0 sends get exclusive (GETX) request • L2 Directory respondswith data (old) • C0 executes store L2 Directory I [old] M@C0 [old] C0 C1 TM mode TM mode 1 0 0 (W-) (--) (--) Signature (--) Signature M [new] M [old] I [none] I [none] Wisconsin Multifacet Project

  31. Conflict! Protocol Animation: Transactional Conflict • In-cache transaction conflict • C1 sends get shared (GETS) request • L2 Directory forwards to P0 • C1 detects conflict and sends NACK L2 Directory M@C0 [old] GETS Fwd_GETS C0 C1 TM mode TM mode 1 0 0 (W-) Signature (--) Signature I [none] M [new] M [new] NACK Wisconsin Multifacet Project

  32. Cache Victimization Gracefully Handled! • Consider eviction of transactional data from Core C0 • No Effect on R/W Set Summary via Signatures • For Conflict Detection,Forward Coherence Requests After Victimization • Trivial with broadcast coherence • Silent S replacements w/ directory: S @ C0  S @ C0 • Writeback to directory sticky: M @ C0  Sticky-M @ C0 • Recall Eager Version Management via Log • On commit: no need to re-fetch victimized block • On abort: SW log walk naturally re-fetches victimized block Wisconsin Multifacet Project

  33. Sticky States: No New Bits in L1 Cache or L2 Directory Shared (L2) Directory State Private (L1) Cache State Wisconsin Multifacet Project

  34. Conflict Resolution • Conflict Resolution • Can wait risking deadlock or abort risking livelock • Wait/abort transaction at requesting or responding proc? • LogTM resolves conflicts at requesting processor • Original LogTM included HW timestamps • Requesting processor can waits (using nacks/retries) • or aborts if other processor is waiting (deadlock possible)& it is logically younger • Current LogTM has requesting processor traps to software contention manager that decides who waits/aborts Wisconsin Multifacet Project

  35. LogTM Conflict Detection Discussion • Eager Conflict Detection via Signatures & Coherence • Advantages: • Supports unbounded nesting • Signatures are compact HW • Signatures software-accessible: save/restore for nesting • Coherence provide efficient conflict detection • Disadvantages • Signatures have false positives • Requires modest coherence protocol changes • Does not (yet) handle thread migration & paging(but coming later) Wisconsin Multifacet Project

  36. Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation • Methods, vs. Lock, & vs. Perfect Signatures • LogTM Operating System Interactions (optional) • Summary & Future Directions Wisconsin Multifacet Project

  37. Single-CMP LogTM System 2-way 2-way 2-way 2-way 2-way … Interconnect Registers Register Checkpoint L2 $ Read TMCount Write LogFrame SummaryRead LogPtr SummaryWrite Core 15 (SMT Context 0) DRAM Core1 Core0 Core13 Core15 Core14 (SMT Context 1) L1$ L1 $ L1$ L1$ L1$ Wisconsin Multifacet Project

  38. Experimental Methodology Infrastructure Virtutech Simics full-system simulation Wisconsin GEMS timing modules System 32 transactional threads (16 cores x 2 SMT threads/core) 32kB 4-way L1 I and D, 64-byte blocks, 1cycle latency 8MB 8-way unified L2, 34 cycle latency L2 directory for coherence, maintains full sharer bit vector Workloads Radiosity, Raytrace, Mp3d, Cholesky Berkeley DB Wisconsin Multifacet Project

  39. Lock Results Wisconsin Multifacet Project

  40. Perfect Signature Results Perfect signatures similar or better than Locks Wisconsin Multifacet Project

  41. Realistic Signature Results Realistic Signatures similar to Perfect Signatures and Locks For our workloads, false positives are not a problem Wisconsin Multifacet Project

  42. What about scalability? • Bigger system • Bigger transactions • False positives are a function of: • Transaction size • Transactional duty cycle • Number of concurrent transactional threads • Filtering due to on-chip directory protocol • Signatures gracefully degrade to serialization Wisconsin Multifacet Project

  43. LogTM Evaluation Discussion • LogTM Running Splash & BerkeleyDB • Good News: • Works! • Performs similar to locks • Signature false postive not (yet) an issues • Bad News • Baby workloads • Baby workloads • Baby workloads Wisconsin Multifacet Project

  44. Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) • Escape Actions, Thread Switching, (& Paging) • Summary & Future Directions Wisconsin Multifacet Project

  45. Escape Actions • Allow non-transactional escapes from a transaction • (e.g., system calls, I/O) • Similar to Zilles’s pause/unpause • Escape actions never: • Abort • Stall • Cause other transactions to abort • Cause other transactions to stall • Commit and compensating actions • similar to open nests Not recommended for the average programmer! Wisconsin Multifacet Project

  46. Thread Switching Support Why? Support long-running transactions What? Conflict Detection for descheduled transactions How? Summary Read / Write Signatures w/ Invariant: If thread t of process P is scheduled to use an active signature,the corresponding summary signature holds the union of the saved signatures from all descheduled threads from process P. Updated using TLB-shootdown-like mechanism<skip example> Wisconsin Multifacet Project

  47. Handling Thread Switching W W W W 00000000 00000000 00000000 00000000 Summary Summary Summary R R R Summary 00000000 00000000 00000000 R 00000000 OS T2 T3 T1 W 00000000 Summary R 00000000 W 01001000 W 0100000 W 0100000 W 00000000 R 01010010 R 01010010 R 01000010 R 00000000 P1 P4 P2 P3 Wisconsin Multifacet Project

  48. Handling Thread Switching W W W 00000000 00000000 00000000 Summary Summary Summary R R R 00000000 00000000 00000000 W 01001000 00000000 Summary OS R 01010010 00000000 Deschedule T2 T3 T1 W 00000000 Summary R 00000000 W 01001000 W 0100000 W 0100000 W 00000000 01001000 R 01010010 R 01010010 R 01000010 R 00000000 01010010 P1 P4 P2 P3 Wisconsin Multifacet Project

  49. Handling Thread Switching W W W 00000000 00000000 00000000 Summary Summary Summary R R R 00000000 00000000 00000000 W W 01001000 01001000 Summary Summary R R 01010010 01010010 W 01001000 Summary OS R 01010010 Deschedule T2 T3 T1 W 00000000 Summary R 00000000 W 01001000 W 0100000 W 0100000 W 00000000 R 01010010 R 01010010 R 01000010 R 00000000 P1 P4 P2 P3 Wisconsin Multifacet Project

  50. Handling Thread Switching W W 01001000 01001000 Summary Summary R R 01010010 01010010 W 01001000 Summary OS R 01010010 T1 T2 T3 W W 00000000 00000000 Summary Summary R R 00000000 00000000 W 00000000 W 0100000 W 0100000 W 00000000 R 00000000 R 01010010 R 01000010 R 00000000 P1 P4 P2 P3 Wisconsin Multifacet Project

More Related