1 / 105

Shared Memory Consistency Models: A Tutorial

By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson. Shared Memory Consistency Models: A Tutorial. Outline. Shared Memory on a Uniprocessor Optimizations on a Uniprocessor Extending to a Multiprocessor – Sequential Consistency

Download Presentation

Shared Memory Consistency Models: A Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models:A Tutorial

  2. Outline • Shared Memory on a Uniprocessor • Optimizations on a Uniprocessor • Extending to a Multiprocessor – Sequential Consistency • Extending to a Multiprocessor – Does Sequential Consistency Matter? • Restoring Sequential Consistency • Conclusion

  3. Outline • Shared Memory on a Uniprocessor • Optimizations on a Uniprocessor • Extending to a Multiprocessor – Sequential Consistency • Extending to a Multiprocessor – Does Sequential Consistency Matter? • Restoring Sequential Consistency • Conclusion

  4. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0

  5. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0

  6. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0

  7. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0

  8. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1

  9. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1

  10. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Critical Zone is Protected Works the same if Process 2 runs first! Process 2 enters its Critical Section

  11. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0 Arbitrary interleaving of Processes

  12. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Arbitrary interleaving of Processes

  13. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Arbitrary interleaving of Processes - Both Processes Blocked, But no harm!

  14. Outline • Shared Memory on a Uniprocessor • Optimizations on a Uniprocessor • Extending to a Multiprocessor – Sequential Consistency • Extending to a Multiprocessor – Does Sequential Consistency Matter? • Restoring Sequential Consistency • Conclusion

  15. SpeedUp: Write takes 100 cycles, buffering takes 1 cycle. So Buffer and keep going. Problem: Read from a Location with a buffered Write pending?? (Single Processor Case) Optimization: Write Buffer with Bypass

  16. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 0 Write Buffering

  17. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 1 Flag1 = 1 Flag2 = 0 Write Buffering

  18. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 1 Flag1 = 1 Flag2 = 0 Write Buffering Uh-Oh!

  19. SpeedUp: Write takes 100 cycles, buffering takes 1 cycle. Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete. Optimization: Write Buffer with Bypass

  20. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section STALL! Flag1 = 0 Flag2 = 1 Flag1 = 1 Flag2 = 0 Write Buffering Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  21. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Flag2 = 0 Write Buffering Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  22. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0 Does this work for Multiprocessors??

  23. Outline • Shared Memory on a Uniprocessor • Optimizations on a Uniprocessor • Extending to a Multiprocessor – Sequential Consistency • Extending to a Multiprocessor – Does Sequential Consistency Matter? • Restoring Sequential Consistency • Conclusion

  24. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0 Does this work for Multiprocessors?? We assume it does! What does that mean?

  25. Sequential Consistency requires that the result of any execution be the same as if the memory accesses executed by each processor were kept in order and the accesses among different processors were interleaved arbitrarily. ...appears as if a memory operation executes atomically or instantaneously with respect to other memory operations (Hennessy and Patterson, 4th ed.) Sequential Consistency for Multiprocessors

  26. Understanding Ordering • Program Order • Compiled Order • Interleaving Order • Execution Order

  27. Reordering • Writes reach memory, and Reads see memory in an order different than that in the Program. • Caused by Processor • Caused by Multiprocessors (and Cache) • Caused by Compilers

  28. Outline • Shared Memory on a Uniprocessor • Extending to a Multiprocessor – Sequential Consistency • Optimizations on a Uniprocessor • Extending to a Multiprocessor – Does Sequential Consistency Matter? • Restoring Sequential Consistency • Conclusion

  29. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  30. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  31. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  32. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  33. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 What Now?? Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  34. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  35. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 How Did That Happen?? Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  36. What happens on a Processor stays on that Processor

  37. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Processor 2 knows nothing about the write to Flag1, so has no reason to stall! Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  38. Another way to look at the Problem: Reordering of Reads and Writes (Loads and Stores).

  39. Consider the Instructions in these processes. Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Simplify as: WX WY RX RY

  40. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 RY RY WY RX WY RX WX WX WX WX WX WX WY RX RY RY RX WY WY RX RY RY RX WY RX WY RX WY RY RY RX WY RX WY RY RY RX WY RX WY RY RY WX WX WX WX WX WX WX WX WX WX WX WX RY RY WY RX WY RX RY RY WY RX WY RX RY RY WY RX WY RX WY RX RY RY RX WY WY RX RY RY RX WY WX WX WX WX WX WX RX WY RX WY RY RY There are 4! or 24 possible orderings. If either WX<RX or WY<RY Then the Critical Section is protected (Correct Behavior).

  41. WY RX RY RY RX WY WY RX RY RY RX WY WX WX WX WX WX WX RX WY RX WY RY RY 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 WX WX WX WX WX WX RY RY WY RX WY RX RY RY WY RX WY RX RY RY WY RX WY RX RY RY WY RX WY RX WX WX WX WX WX WX WY RX RY RY RX WY WY RX RY RY RX WY RX WY RX WY RY RY RX WY RX WY RY RY RX WY RX WY RY RY WX WX WX WX WX WX There are 4! or 24 possible orderings. If either WX<RX or WY<RY Then the Critical Section is protected (Correct Behavior) 18 of the 24 orderings are OK. But the other 6 are trouble!

  42. Consider another example...

  43. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 0 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  44. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Head = 0 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  45. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 1 Data = 2000 Head = 0 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  46. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Head = 1 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  47. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Head = 1 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  48. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Wrong Data! Head = 1 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  49. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 1 Data = 2000 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate. Fix: Write must be acknowledged before another write (or read) from the same processor.

  50. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 0 Data = 0 Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.

More Related