1 / 45

LogTM-SE: Decoupling Hardware Transactional Memory from Caches

LogTM-SE: Decoupling Hardware Transactional Memory from Caches. Luke Yen , Jayaram Bobba, Michael R. Marty, Kevin E. Moore, Haris Volos, Mark D. Hill, Michael M. Swift, and David A. Wood. Multifacet Project ( www.cs.wisc.edu/multifacet ) Computer Sciences Dept., Univ. of Wisconsin-Madison.

takoda
Download Presentation

LogTM-SE: Decoupling Hardware Transactional Memory from Caches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LogTM-SE: Decoupling Hardware Transactional Memory from Caches Luke Yen, Jayaram Bobba,Michael R. Marty, Kevin E. Moore, Haris Volos,Mark D. Hill, Michael M. Swift, and David A. Wood Multifacet Project (www.cs.wisc.edu/multifacet) Computer Sciences Dept., Univ. of Wisconsin-Madison University of Wisconsin-Madison

  2. Executive Summary • Hardware Transactional Memory (HTM) Fast • HW handles old/new versions (e.g., write buffer) • HW handles conflict detection (R/W bits & coherence) • But Closely Coupled to L1 cache • On critical paths & hard for SW to save/restore • Our Approach: Decoupled, Simple HW, SW control • LogTM Signature Edition (LogTM-SE) • HW: LogTM’s Log + Signatures (from Illinois Bulk) • SW: Unbounded nesting, thread switching, & paging Wisconsin Multifacet Project

  3. Outline Motivation How HTMs accelerate TM HTM Issues Our Approach Logs for version management Signatures for conflict detection LogTM Signature Edition (LogTM-SE) Hardware LogTM-SE in Operation Conclusions and future work Wisconsin Multifacet Project

  4. How HTMs Accelerate TM • Version Management • Save old values (for abort) & new values (for commit) • Write buffer, cache incoherence, overflow structures • Conflict Detection • Record read/write sets & detect overlaps • Read/Write (R/W) bits on cache blocks & coherence Wisconsin Multifacet Project

  5. Some HTM Issues • R/W bits in precious L1 cache design • Replicate R/W bits for SMT? • Replicate again for (bounded) nesting? • Save/restore R/W for thread switch? • Modify for paging (virtual page moves)? Wisconsin Multifacet Project

  6. Our Approach • Decoupled: Decouple HTM state from L1 caches • Simple HW: Keep HW state simple & SW accessible • SW Control: Have SW manage rare, complex events • (Apply classic “systems” principles to HTMs) Wisconsin Multifacet Project

  7. Read Write R W R W R W R W R W R W Decoupling R / W bits from Cache AFTER: BEFORE: Registers SMT Thread Context Tag Data Data Caches Wisconsin Multifacet Project

  8. Version Management • Save old values (for abort) & new values (for commit) • Use LogTM’s Log • Before writing new values into memory, HW writesold values (and their virtual addresses) into log • Allocated per-thread in virtual memory (like pthread’s stacks) • Log exposed to SW & not tied to a processor • Why? • Decoupled, Simple HW, SW Control Wisconsin Multifacet Project

  9. Tag Data Version Management Processor Hardware Registers Register Checkpoint LogFrame TMcount LogPtr SMT Thread Context Data Caches Wisconsin Multifacet Project

  10. Conflict Detection • Record read/write sets & detect overlaps • Adapt Signatures from Bulk [Ceze et al., ISCA 2006] • Over-approximate read/write sets in per-proc. Bloom Filters • Check signatures on coherence events (unlike Bulk) • Replicate for SMT • Exposed to SW for nesting, thread switching, etc. • Why? • Decoupled, Simple HW, SW Control Wisconsin Multifacet Project

  11. Signature Operation Program: xbegin LD A ST B LD C LD D ST C … External ST E External ST F A C D B FALSE POSITIVE: CONFLICT! ALIAS Hash Function(s) NO CONFLICT 00100100 00000100 00100100 00000000 00100100 00100100 R W 00100010 00000000 00100010 00000010 00100010 Wisconsin Multifacet Project

  12. Outline Motivation LogTM Signature Edition (LogTM-SE) Hardware Single-CMP system Processor hardware Experimental Results LogTM-SE in Operation Conclusions and future work Wisconsin Multifacet Project

  13. Single-CMP System Core1 Core2 Core14 Core15 Core16 … L1 $ L1$ L1$ L1$ L1$ Interconnect L2 $ DRAM Wisconsin Multifacet Project

  14. LogTM-SE Processor Hardware Segmented log, like LogTM Track R / W sets withR / Wsignatures Over-approximate R / W sets Tracks physical addresses Summary signature used for virtualization Conflict detection by coherence protocol Check signatures on every memory access for SMT Tag Data Registers Register Checkpoint LogFrame TMcount Read LogPtr Write SummaryRead SummaryWrite SMT Thread Context NO TM STATE Data Caches Wisconsin Multifacet Project

  15. Experimental Methodology Infrastructure Virtutech Simics full-system simulation Wisconsin GEMS timing modules System 32 transactional threads (16 cores x 2 SMT threads/core) 32kB 4-way L1 I and D, 64-byte blocks, 1cycle latency 8MB 8-way unified L2, 34 cycle latency L2 directory for coherence, maintains full sharer bit vector Workloads Radiosity, Raytrace, Mp3d, Cholesky Berkeley DB Wisconsin Multifacet Project

  16. Lock Results Wisconsin Multifacet Project

  17. Perfect Signature Results Perfect signatures similar or better than Locks Wisconsin Multifacet Project

  18. Realistic Signature Results Realistic Signatures similar to Perfect Signatures and Locks For our workloads, false positives are not a problem Wisconsin Multifacet Project

  19. What about scalability? • Bigger system • Bigger transactions • False positives are a function of: • Transaction size • Transactional duty cycle • Number of concurrent transactional threads • Filtering due to on-chip directory protocol • Signatures gracefully degrade to serialization Wisconsin Multifacet Project

  20. HTM Issues (Solutions) • R/W bits in precious L1 cache design • R/W signatures: out of L1 cache & software-accessible • Replicate R/W bits for SMT? • Easier to replicate signatures for SMT, but false positives • Replicate again for (bounded) nesting? • Save/restore R/W for thread switch? • Modify for paging (virtual page moves)? Wisconsin Multifacet Project

  21. Outline Motivation LogTM Signature Edition (LogTM-SE) Hardware LogTM-SE in Operation Unbounded Nested Transactions Thread Switching (Paging) Conclusions and future work Wisconsin Multifacet Project

  22. Unbounded Nesting Support Why? Composability: libraries Software Constructs: Retry, OrElse [Harris, PPoPP ‘05] What? Signatures for each nesting level How? One R / W signature set per SMT thread Save / Restore signatures using Transaction Log Wisconsin Multifacet Project

  23. Nested Begin Transaction Log Program Processor State xbegin LD … ST … xbegin 01001000 01001000 00000000 R 01010010 00000000 Xact header 01010010 W Undo entry Undo entry 1 TMCount Undo entry Log Frame Xact header Log Ptr Wisconsin Multifacet Project

  24. Nested Begin Transaction Log Program Processor State xbegin LD … ST … xbegin 01001000 R 01010010 Xact header W Undo entry Undo entry 2 TMCount Undo entry Log Frame Xact header 01001000 01010010 Log Ptr Wisconsin Multifacet Project

  25. Partial Abort Transaction Log Program Processor State xbegin LD … ST … xbegin LD … ST … ABORT! 01001001 01001000 R 01010010 01110110 Xact header W Undo entry Undo entry 1 2 TMCount Undo entry Log Frame Xact header 01001000 01010010 Log Ptr Undo entry Undo entry Wisconsin Multifacet Project

  26. Nested Commit Transaction Log Program Processor State xbegin LD … ST … xbegin LD … ST … xend 01001000 01001001 R 01010010 Xact header 01110110 W Undo entry Undo entry 1 2 TMCount Undo entry Log Frame Xact header 01001000 01010010 Log Ptr Undo entry Undo entry Wisconsin Multifacet Project

  27. Unbounded Nesting Support Summary Closed nesting: Begin: save signatures Abort: restore signatures Commit: No signature action Open nesting: Begin: save signatures Abort: restore signatures Commit: restore signatures Wisconsin Multifacet Project

  28. Thread Switching Support Why? Support long-running transactions What? Conflict Detection for descheduled transactions How? Summary Read / Write signatures: If thread t of process P is scheduled to use an active signature,the corresponding summary signature holds the union of the saved signatures from all descheduled threads from process P. Updated using TLB-shootdown-like mechanism Wisconsin Multifacet Project

  29. Handling Thread Switching W W W W 00000000 00000000 00000000 00000000 Summary Summary Summary R R R Summary 00000000 00000000 00000000 R 00000000 OS T2 T3 T1 W 00000000 Summary R 00000000 W 01001000 W 0100000 W 0100000 W 00000000 R 01010010 R 01010010 R 01000010 R 00000000 P1 P4 P2 P3 Wisconsin Multifacet Project

  30. Handling Thread Switching W W W 00000000 00000000 00000000 Summary Summary Summary R R R 00000000 00000000 00000000 W 01001000 00000000 Summary OS R 01010010 00000000 Deschedule T2 T3 T1 W 00000000 Summary R 00000000 W 01001000 W 0100000 W 0100000 W 00000000 01001000 R 01010010 R 01010010 R 01000010 R 00000000 01010010 P1 P4 P2 P3 Wisconsin Multifacet Project

  31. Handling Thread Switching W W W 00000000 00000000 00000000 Summary Summary Summary R R R 00000000 00000000 00000000 W W 01001000 01001000 Summary Summary R R 01010010 01010010 W 01001000 Summary OS R 01010010 Deschedule T2 T3 T1 W 00000000 Summary R 00000000 W 01001000 W 0100000 W 0100000 W 00000000 R 01010010 R 01010010 R 01000010 R 00000000 P1 P4 P2 P3 Wisconsin Multifacet Project

  32. Handling Thread Switching W W 01001000 01001000 Summary Summary R R 01010010 01010010 W 01001000 Summary OS R 01010010 T1 T2 T3 W W 00000000 00000000 Summary Summary R R 00000000 00000000 W 00000000 W 0100000 W 0100000 W 00000000 R 00000000 R 01010010 R 01000010 R 00000000 P1 P4 P2 P3 Wisconsin Multifacet Project

  33. Thread Switching Support Summary Summary Read / Write signatures Summarizes descheduled threads with active transactions One OS structure per process Check summary signature on every memory access Updated on transaction deschedule Similar to TLB shootdown Coherence Wisconsin Multifacet Project

  34. Paging Support Summary Problem: Changing page frames Need to maintain isolation on transactional blocks Solution: On Page-Out: Save Virtual -> Physical mapping On Page-In: If different page frame, update signatures with physical address of transactional blocks in new page frame. Wisconsin Multifacet Project

  35. HTM Issues (Solutions) • R/W signatures: out of L1 & software-accessible • Replicate signatures for SMT, but false positives • Replicate again for (bounded) nesting? • Signatures saved/restored on log • Save/restore R/W for thread switch? • Signatures saved & distributed in summary signatures • Modify for paging (virtual page moves)? • (Physical) signatures updated on page-in Wisconsin Multifacet Project

  36. Future Work Explore scalability of signatures Bigger system Bigger transactions Modify OS for LogTM-SE Optimize OS-support for Reduced Overheads Wisconsin Multifacet Project

  37. Executive Summary • Hardware Transactional Memory (HTM) Fast • HW handles old/new versions (e.g., write buffer) • HW handles conflict detection (R/W bits & coherence) • But Closely Coupled to L1 cache • On critical paths & hard for SW to save/restore • LogTM Signature Edition (LogTM-SE) • HW: LogTM’s Log + Signatures (from Illinois Bulk) • SW: Unbounded nesting, thread switching, & paging • Our Approach: Decoupled, Simple HW, SW control Wisconsin Multifacet Project

  38. Opal Random Tester Simics Detailed Processor Model Deterministic Contended locks Trace flie Microbenchmarks Google “Wisconsin GEMS” Works w/ Simics (free to academics)  commercial OS/apps SPARC out-of-order, x86 in-order, CMPs, SMPs, & LogTM GPL release of GEMS used in four HPCA 2007 papers Wisconsin Multifacet Project

  39. GEMS News • Version 1.4 available Friday: • LogTM bugfixes • MESI Single-CMP protocol • Fixes for GCC 4.X Compilation See www.cs.wisc.edu/gems • GEMS 2.0: A future major LogTM release will include: • Single-CMP TM protocol • Signature support • Software abort handler Wisconsin Multifacet Project

  40. Backup Wisconsin Multifacet Project

  41. HTM Virtualization Mechanisms Wisconsin Multifacet Project

  42. LogTM Overview [HPCA ’06] Version Management Keep old values for abort AND new values for commit LogTM: Eager: record old values “elsewhere”; update “in place” Other HTMs: Lazy: update “elsewhere”; keep old values “in place” Conflict Detection Find read-write, write-read or write-write conflictsamong concurrent transactions LogTM: Eager: detect conflict on every read/write Other HTMs: Lazy: detect conflict at end (commit/abort)  FastCommit  Allows Stalling Wisconsin Multifacet Project

  43. Benchmark Details Wisconsin Multifacet Project

  44. Signature Implementations Results assume 2048-bit signatures, and operates on physical addresses Wisconsin Multifacet Project

  45. Paging Support Summary At Page Out, Remember VP->PP At Page In, if (VP->PP’) for each thread t in process if t is in transaction: Update signatures of t Wisconsin Multifacet Project

More Related