1 / 71

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan,

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Computer Science Department Carnegie Mellon University (Appeared in ISCA 2000.). P. P. P. P. C. C. C. C. C. C. C. Shared Memory.

didina
Download Presentation

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Computer Science Department Carnegie Mellon University (Appeared in ISCA 2000.)

  2. P P P P C C C C C C C Shared Memory Multithreaded Machines Are Everywhere Threads P P C C P C C C C Shared Memory SUN MAJC, IBM Power4 ALPHA 21464 Dual Pentium SGI Origin How can we use them? Parallelism!

  3. Automatic Parallelization Proving independence of threads is hard: • complex control flow • complex data structures • pointers, pointers, pointers • run-time inputs How can we make the compiler’s job feasible? Thread-Level Speculation (TLS)

  4. Time Example Processor = hash[3] … hash[10]= … • while (...){ • x = hash[index1]; • … • hash[index2] = y; • ... • } = hash[19] … hash[21]= … = hash[33] … hash[30]= … = hash[10] … hash[25]= …

  5. Time Example of Thread-Level Speculation Processor Processor Processor Processor Epoch 1 = hash[3] … hash[10] = … Epoch 2 = hash[19] … hash[21]= … Epoch 3 = hash[33] … hash[30]= … Epoch 4 = hash[10] … hash[25]= …

  6. Time Example of Thread-Level Speculation Processor Processor Processor Processor Epoch 1 Epoch 2 Epoch 3 Epoch 4 = hash[3] … hash[10] = … = hash[19] … hash[21]= … = hash[33] … hash[30]= … = hash[10] … hash[25]= … Violation!

  7. Time Example of Thread-Level Speculation Processor Processor Processor Processor Epoch 1 Epoch 2 Epoch 3 Epoch 4 = hash[3] … hash[10] = … commit? = hash[19] … hash[21]= … commit? = hash[33] … hash[30]= … commit? = hash[10] … hash[25]= … commit? Violation!    

  8. Time Retry Example of Thread-Level Speculation Processor Processor Processor Processor Epoch 1 Epoch 2 Epoch 3 Epoch 4 = hash[3] … hash[10] = … commit? = hash[19] … hash[21]= … commit? = hash[33] … hash[30]= … commit? = hash[10] … hash[25]= … commit? Violation!     Epoch 4 = hash[10] … hash[25]= … commit? 

  9. Goals of Our Approach 1) Handle arbitrary memory accesses • i.e. not just array references 2) Preserve performance of non-speculative workloads • keep hardware support minimal and simple 3) Apply to any scale of multithreaded architecture • CMPs, SMT processors, more traditional MPs effective, simple, and scalable TLS

  10. Overview of Our Approach System requirements: 1) Detect data dependence violations • extend invalidation-based cache coherence 2) Buffer speculative modifications • use the caches as speculative buffers coherence already works at a variety of scales hence our scheme is also scalable

  11. Related Schemes • Wisconsin (Multiscalar, Trace Processor) • Stanford (Hydra) • U.P. Catalunya (Speculative Multithreading) • Intel/U. Portland (Dynamic Multithreading) • Illinois at U.C. (I-ACOMA) our approach seamlessly scales both up and down

  12. Outline Details of our Approach • life cycle of an epoch • speculative coherence • what happens at commit time • forwarding data between epochs • Performance • Conclusions

  13. Time Slow Commit: Becomes Complete, Speculative Pass Homefree Fast Commit: Life Cycle of an Epoch Spawned Init   Speculative  Work  Commit? Wait to be Homefree?    

  14. Thread Identifier (TID) Sequence Number Epoch Numbers Represent a partial ordering • signed-compare sequence numbers if TIDs match • allows for wrap-around • otherwise the epochs are unordered • from independent programs • from independent chains of speculation within one program

  15. Speculative Thread Model Round-robin schedule of epochs to processors • not a requirement of our scheme, just for convenience Each epoch spawns the next • through a lightweight fork instruction (10 cycles) Violations detected through polling • each epoch runs to completion before detecting failed speculation and restarting Violation chaining • if an epoch suffers a violation, we squash all logically-later epochs many possibilities to be evaluated in future work

  16. Preserving Correctness Speculation must fail whenever speculative state is lost • eg., replacement of a speculative line, ORB overflow Any exceptions are suppressed until epoch is homefree • eg., divide by zero, segfault Polling violation detection must avoid infinite looping • requires a poll inside each loop No system calls while speculative (for now) ensures original sequential semantics are preserved

  17. Time Becomes Complete, Speculative Pass Homefree Mechanisms to Squash or Commit Life Cycle of an Epoch Spawned Speculative Coherence Commit?

  18. Data Data State State Tag Tag Invalid Invalid - - - - Shared Memory (X=2) MESI Coherence Example Thread A: Thread B: Processor Processor Cache Cache

  19. Data Data State State Tag Tag Invalid Invalid - - - - Shared Memory (X=2) MESI Coherence Example Load X Thread A: Thread B: Processor Processor Cache Cache Read

  20. Data Data State State Tag Tag Excl. Invalid - X 2 - Shared Memory (X=2) MESI Coherence Example Load X Thread A: Thread B: Processor Processor Cache Cache Read Fill

  21. Data Data State State Tag Tag Excl. Invalid X - 2 - Shared Memory (X=2) MESI Coherence Example Load X Store X=3 Thread A: Thread B: Processor Processor Cache Cache Read-Exclusive read-exclusive invalidates all other copies

  22. Data Data State State Tag Tag Invalid Invalid - - - - Shared Memory (X=2) MESI Coherence Example Load X Store X=3 Thread A: Thread B: Processor Processor Cache Cache Read-Exclusive Invalidation read-exclusive invalidates all other copies

  23. Data Data State State Tag Tag Dirty Invalid - X - 3 Shared Memory (X ) MESI Coherence Example Load X Store X=3 Thread A: Thread B: Processor Processor Cache Cache Fill Read-Exclusive Invalidation the state ‘dirty’ implies exclusiveness

  24. Speculative Coherence Example Highlights of our scheme: • detection of a data dependence violation • speculatively modifiedandshared cache lines Epoch6: Epoch4: Epoch5: Load X Store X=3 Load X

  25. Data Data State State Tag Tag Invalid Invalid - - - - Shared Memory (X=2) Speculative Coherence Example Load X Epoch5: Epoch6: Processor Processor Cache Cache Read

  26. Data Data State State Tag Tag Excl. Invalid X - - 2 Shared Memory (X=2) Speculative Coherence Example Load X Epoch5: Epoch6: Processor Processor Cache Cache Spec. Loaded Read Fill track which lines are speculatively loaded

  27. Data Data State State Tag Tag Invalid Excl. - X - 2 Shared Memory (X=2) Speculative Coherence Example Load X Epoch5: Epoch6: Store X=3 Processor Processor Cache Cache Spec. Loaded Sp Read-Ex (epoch5) speculative msgs piggyback epoch number

  28. Data Data State State Tag Tag Invalid Excl. - X - 2 Shared Memory (X=2) Speculative Coherence Example Load X Epoch5: Epoch6: Store X=3 Processor Processor Cache Cache Spec. Loaded Sp Read-Ex (epoch5) Sp Inv (epoch5) epoch5 < epoch6, and speculatively loaded

  29. Data Data State State Tag Tag Invalid Invalid - - - - Shared Memory (X=2) Speculative Coherence Example Load X  speculation failed! Epoch5: Epoch6: Store X=3 Processor Processor Cache Cache Sp Read-Ex (epoch5) Sp Inv (epoch5) speculation fails for epoch 6

  30. Data Data State State Tag Tag Excl. Invalid X - - 3 Shared Memory (X=2) Speculative Coherence Example Load X  speculation failed! Epoch5: Epoch6: Store X=3 Processor Processor Cache Cache Spec. Modified Fill Sp Read-Ex (epoch5) Sp Inv (epoch5) track which lines are speculatively modified

  31. Speculative Coherence Example Highlights of our scheme: • detection of a data dependence violation • speculatively modifiedandshared cache lines Epoch6: Epoch4: Epoch5: Load X Store X=3 Load X

  32. Epoch4: Processor Cache Data Data State State Tag Tag Excl. Invalid - X - 3 Shared Memory (X=2) Speculative Coherence Example Epoch5: Store X=3 Processor Cache Spec. Modified

  33. Data Data State State Tag Tag Invalid Excl. - X - 3 Shared Memory (X=2) Speculative Coherence Example Epoch4: Epoch5: Store X=3 Load X Processor Processor Cache Cache Spec. Modified Read

  34. Data Data State State Tag Tag Invalid X - - 3 Shared Memory (X=2) Speculative Coherence Example Epoch4: Epoch5: Store X=3 Load X Processor Processor Cache Cache Spec. Modified Shared Read notify shared both speculatively modified and shared!

  35. Data Data State State Tag Tag X X 3 2 Shared Memory (X=2) Speculative Coherence Example Epoch4: Epoch5: Store X=3 Load X Processor Processor Cache Cache Spec. Loaded Spec. Modified Shared Shared Fill Read notify shared multiple versions of the same cache line

  36. Summary of New Speculative Line State New cache line state: • has it been speculatively loaded? • detect dependence violations • has it been speculatively modified? • buffer speculative modifications • is it in a speculative shared or exclusive state? • important performance optimizations What if a speculative cache line is replaced? • speculation fails for that epoch

  37. - - - - - - - - - - - - Implementation of Speculative State Processor Cache Data State Tag

  38. Tag SL SM - - - - - - - - - - - - - - - - - - - - Implementation of Speculative State Processor Cache Speculatively Loaded Data State Speculatively Modified modest amount of extra space

  39. Time Becomes Complete, Speculative Pass Homefree Squash Life Cycle of an Epoch Spawned Speculative Coherence Commit? Mechanisms to Squash or Commit

  40. Flash Reset When Speculation Fails Processor Cache Data State Tag SM SL Sp Ex * * 0 1 Sp Sh * * 0 1 Sp Ex * * 1 0 Sp Sh * * 1 1

  41. If Set then Invalidate; Flash Reset When Speculation Fails Processor Cache Data State Tag SM SL Excl * * 0 0 * * 0 0 Shared Sp Ex * * 1 0 * * 1 0 Sp Sh

  42. When Speculation Fails Processor Cache Data State Tag SM SL Excl * * 0 0 * * 0 0 Shared Invalid * * 0 0 Invalid * * 0 0 quick bit operation

  43. Time Becomes Complete, Speculative Pass Homefree Commit Life Cycle of an Epoch Spawned Speculative Coherence Commit? Mechanisms to Squash or Commit

  44. Flash Reset When Speculation Succeeds Processor Cache Data State Tag SM SL Sp Ex * * 0 1 Sp Sh * * 0 1 Sp Ex * * 1 0 Sp Sh * * 1 1

  45. SM & Exclusive: Become Dirty When Speculation Succeeds Processor Cache Data State Tag SM SL Excl * * 0 0 * * 0 0 Shared Sp Ex * * 1 0 Sp Sh * * 1 0

  46. SM & Shared: Need Exclusive Access When Speculation Succeeds Processor Cache Data State Tag SM SL Excl * * 0 0 * * 0 0 Shared Sp Ex * * 1 0 Sp Sh * * 1 0 want to avoid searching entire cache

  47. When Speculation Succeeds Processor Cache Data State Tag SM SL ORB Excl * * 0 0 - * * 0 0 Shared - X Sp Ex * * 1 0 Sp Sh X * 1 0 ownership required buffer (ORB)

  48. - - X When Speculation Succeeds Processor Cache Data State Tag SM SL ORB Excl * * 0 0 * * 0 0 Shared Sp Ex * * 1 0 Sp Sh X * 1 0 Upgrade-Request (X)

  49. - If SM, Become Dirty; Flash Reset - - When Speculation Succeeds Processor Cache Data State Tag SM SL ORB Excl * * 0 0 * * 0 0 Shared Sp Ex * * 1 0 Sp Sh X * 1 0 Ack (X) Upgrade-Request (X)

  50. - - - When Speculation Succeeds Processor Cache Data State Tag SM SL ORB Excl * * 0 0 * * 0 0 Shared Dirty * * 0 0 Dirty X * 0 0 flush the ORB, then quick bit operations

More Related