1 / 22

“Shared Memory Consistency Models: A Tutorial” – Adve & Gharachorloo

“Shared Memory Consistency Models: A Tutorial” – Adve & Gharachorloo. Robert T. Bauer. Shared Memory. Shared memory – single address space abstraction in a multiprocessor environment. Memory Model. Specifics how reads and writes appear to executed May (usually) varies by level

sahara
Download Presentation

“Shared Memory Consistency Models: A Tutorial” – Adve & Gharachorloo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Shared Memory Consistency Models: A Tutorial” – Adve & Gharachorloo Robert T. Bauer

  2. Shared Memory • Shared memory – single address space abstraction in a multiprocessor environment.

  3. Memory Model • Specifics how reads and writes appear to executed • May (usually) varies by level • Programming language can provide a memory model, for example Java has its own (JMM, JSR 133) • Processor • Memory subsystem

  4. Definitions • Sequential (Processor) • Result of an execution is the same as if the operations had been executed in the order specified by the program. • Sequentially Consistent (Multiprocessor) • Result of any execution is the same as if the operations of all the processors were executed in some sequential order and the operations of each individual processor appear in the sequence in the order specified by the program.

  5. Uniprocessor Processor Memory operations in program order — sequential memory

  6. Multiprocessor Processor Processor Sequential Consistency memory

  7. Relaxing Sequential Consistency • Program Order • Write followed by a read to a different location can be reordered • Write followed by a write to a different location can be reordered • Read followed by a write to (or read from) a different location can be reordered • Write Atomicity • Another processor’s writes can be read even though the write is not visible to the writing processor • A processor’s own writes can be read even though the writes are not visible to other processors

  8. Uniprocessor with Write Buffer Processor P1: flag1 = 1 if(flag2 == 0){ critical section } P2: flag2 = 1 if(flag1 == 0){ critical section } Write Buffer memory

  9. Multiprocessor with Write Buffer Processor Processor P2: flag2 = 1 if(flag1 == 0){ critical section } P1: flag1 = 1 if(flag2 == 0){ critical section } Write Buffer Write Buffer memory

  10. P1: flag1 = 1 mb() if(flag2 == 0){ critical section } P2: flag2 = 1 mb() if(flag1 == 0){ critical section } Memory Barrier

  11. Effect of Memory Barrier Processor Processor P1: flag1 = 1 mb() if(flag2 == 0){ critical section } P1: flag1 = 1 mb() if(flag2 == 0){ critical section } Write Buffer Write Buffer memory

  12. Write Through & Memory Bus P1 P1 P1P2 data = 2000 while(head ==0) head = 1 ; … = data data Write Through Cache Write Through Cache head Interconnect P2 sees write to “head” before seeing write to data 2 1 Memory head data Program Order has been relaxed

  13. P1’s writes arrive in-order to memory The read from data occurs before the cache-invalidate signal arrives at P2 P2 reads “new” value of head P2 reads “old” value of data from cache ISSUE Memory operations need to “complete.” Cache-invalidate signal needs to propagate Write Atomicity has been relaxed Late Cache Invalidate Signal P1 P1 invalidate data Write Through Cache Write Through Cache head data Interconnect 1 2 Memory head data 3

  14. Fences

  15. Relaxing Write to Read • Reorder read following previous writes • IBM prohibits read from returning the value of a write before the write is visible to all processors. • TSO can read own processors write • Cannot read another processor’s write early (must be visible to all processors). • Our buffer example is similar in effect • IBM has serialization instruction (so that the writes propagate and the reads won’t be reordered) • TSO – won’t be reordered if instruction is RMW – so you can “enforce” order using a read-modify-write instruction.

  16. Relaxing Write to Read/Write • SPARC PSO • Writes to different locations can be pipelined or overlapped – reach memory or caches out-of-order • PSO identical to TSO, but allows a processor to read its own writes early • Processors cannot read other processor’s writes before they are globally visible • STBAR (store barrier) so writes can’t get reordered

  17. Weak Ordering • Data operations (read/writes) • Synchronization operations (fences/barriers) • Model allows • Reordering of operations between synchronization operations • Each processor ensures that synchronization instructions are not issued until all previous operations (data and sync) are complete. • Ensures that writes always appear atomic, so no fence is required to ensure write atomicity

  18. Release Consistency • Acquire: read memory operation that gains access to a set of shared locations • Release: a write operation that grants permission for accessing a set of shared locations • Two flavors • Maintain sequential consistency among “special” operations • Maintain processor consistency among “special” operations

  19. Release Consistency • RC – SC • Acquire  all, all  release, special  special • If acquire appears before any operation, program order is enforced so that “acquire” completes before the following operations. • RC – PC • Acquire  all, all->release, special  special, except for a special write followed by a special read

  20. RC - PC • Program order for read following write requires using rmw operations, if write being ordered is “ordinary” then the write in the rmw needs to be a release

  21. Just to make it more complicated • Alpha • mb: enforce program order between any statements • wmb: only enforce program order among write statements • RMO • (LD | ST) # (LD |ST) • LDST#LD means that load and store operations before the barrier must be completed before any load operation after the barrier. Store operations after the barrier may be reordered before the barrier. • Power • SYNC: like alpha’s mb, except that when placed between two reads to the same location, the second read may go first. • Power allows writes to be seen early • RMW sequences are used to make writes appear atomic

  22. Discussion/Conclusion • System-centric: directly expose ordering and write atomicity relaxations. Complicated, difficult to port. • Programmer-centric: Programmer provides information to determine what optimizations can be performed (when reading/writing particular variables). Compiler complexity increased. Debugging more difficult • Relaxed memory models have proven to be effective in increasing performance; the cost of this higher performance is greater complexity.

More Related