1 / 21

Computer Architecture Memory Coherency & Consistency

Computer Architecture Memory Coherency & Consistency . By Dan Tsafrir, 11/4/2011 Presentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, and Adi Yoaz. Processor 1. Processor 2. L1 cache. L1 cache. L2 cache (shared). Memory. Coherency - intro .

nigel
Download Presentation

Computer Architecture Memory Coherency & Consistency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer ArchitectureMemory Coherency & Consistency By Dan Tsafrir, 11/4/2011Presentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, and Adi Yoaz

  2. Processor 1 Processor 2 L1 cache L1 cache L2 cache (shared) Memory Coherency - intro • When there’s only one core • Caching doesn’t affect correctness • But what happens when ≥ 2 cores work simultaneously on same memory location? • If both are reading, not a problem • Otherwise, one might use a stale, out-of-date copy of the data • The inconsistencies might lead to incorrect execution • Terminology • Memory coherency <=> Cache coherency

  3. The cache coherency problem for a single memory location Stale value, different than correspondingmemory location and CPU-1 cache.(The next read by CPU-2 will yield “1”.)

  4. A memory system is coherent if… • Informally, we could say (or we would like to say) that... • A memory system is coherent if… • Any read of a data item returns the most recently written value of that data item • (This definition is intuitive, but overly simplistic) • More formally…

  5. A memory system is coherent if… • - Processor P writes to location X, and later- P reads from X, and- No other processor writes to X between above write & read=> Read must return value previously written by P • - P1 writes to X- Some time – T – elapses- P2 reads from X=> For big enough T, P2 will read the value written by P1 • Two writes to same location by any two processors are serialized=> Are seen in the same order by all processors (if “1” and then “2” are written, no processor would read “2” & “1”)

  6. A memory system is coherent if… • - Processor P writes to location X, and later- P reads from X, and- No other processor writes to X between above write & read=> Read must return value previously written by P • - P1 writes to X- Some time – T – elapses- P2 reads from X=> For big enough T, P2 will read the value written by P1 • Two writes to same location X by any two processors are serialized=> Are seen in the same order by all processors (if “1” and then “2” are written, no processor would read “2” & “1”) Simply preserves program order(needed even on uniprocessor). Defines notation of what it means to have acoherent view of memory; if X is never updated regardless of the duration of T, than the memory is not coherent. If P1 writes to X and then P2 writes to X, serialization of writes ensures that everyprocessor will see P2’s write eventually; otherwise P1’s value might be maintainedindefinitely.

  7. Memory Consistency • The coherency definition is not enough • So as to be able to write correct programs • It must be supplemented by a consistency model • Critical for program correctness • Coherency & consistency are 2 different, complementary aspects of memory systems • Coherency • What values can be returned by a read • Relates to behavior of reads & writes to the same memory location • Consistency • When will a written value be returned by a subsequent read • Relates to behavior of reads & writes to different memory locations

  8. Memory Consistency (cont.) • “How consistent is the memory system?” • A nontrivial question • Assume: locations A & B areoriginally cached by P1 & P2 • With initial value = 0 • If writes are immediately seenby other processors • Impossible for both “if” conditions to be true • Reaching “if” means either A or B must hold 1 • But suppose: • (1) “Write invalidate” can be delayed, and • (2) Processor allowed to compute during this delay • => It’s possible P1 & P2 haven’t seen the invalidations of B & A until after the reads, thus, both “if” conditions are true • Should this be allowed? • Determined by consistency model

  9. Consistency models • From most strict to most relaxed • Strict consistency • Sequential consistency • Weak consistency • Release consistency • […many…] • Stricter models are • Easier to understand • Harder to implement • Slower • Involve more communication • Waste more energy

  10. Strict consistency (“linearizability”) • All memory operations are ordered in time • Any read to location X returns the most recent write op to X • This is the intuitive notion of memory consistency • But too restrictive and thus unused

  11. Sequential consistency • Relaxation of strict (defined by Lamport) • Requires the result of any execution be the same as if memory accesses were executed in some arbitrary order • Can be a different order upon each run • Left is sequentially consistent (can be ordered as in the right) • Q. What if we flip the order of P2’s reads (on left)? time

  12. Weak consistency • Access to “synchronization variables” are sequentially consistent • No access to a synchronization variable is allowed to be performed until all previous writes have completed everywhere • No data access (read or write) is allowed to be performed until all previous accesses to synchronization variables have been performed • In other words, the processor doesn’t need to broadcast values at all, until a synchronization access happens • But then it broadcasts all values to all cores

  13. Release consistency • Before accessing shared variable • Acquire op must be completed • Before a release allowed • All accesses must be completed • Acquire/release calls are sequentially consistent • Serves as “lock”

  14. MESI Protocol • Each cache line can be on one of 4 states • Invalid – Line data is not valid (as in simple cache) • Shared – Line is valid & not dirty, copies may exist in other caches • Exclusive – Line is valid & not dirty, other processors do not have the line in their local caches • Modified – Line is valid & dirty, other processors do not have the line in their local caches • (MESI = Modified, Exclusive, Shared, Invalid) • Achieves sequential consistency

  15. Two classes of protocols to track sharing • Directory based • Status of each memory block kept in just 1 location (=directory) • Directory-based coherence has bigger overhead • But can scale to bigger core counts • Snooping • Every cache holding a copy of the data has a copy of the state • No centralized state • All caches are accessible via broadcast (bus or switch) • All cache controllers monitor (or “snoop”) the broadcasts • To determine if they have a copy of what’s requsted

  16. Processor 1 Processor 2 L1 cache L1 cache L2 cache (shared) Memory Multi-processor System: Example • P1 reads 1000 • P1 writes 1000 [1000] [1000]: 6 miss E M 00 [1000]: 5 10 miss [1000]: 5 [1000]: 5

  17. Processor 1 Processor 2 L1 cache L1 cache L2 cache (shared) Memory Multi-processor System: Example • P1 reads 1000 • P1 writes 1000 • P2 reads 1000 • L2 snoops 1000 • P1 writes back 1000 • P2 gets 1000 [1000] [1000]: 6 [1000]: 6 miss S M S [1000]: 6 [1000]: 5 10 11 [1000]: 5

  18. Processor 1 Processor 2 L1 cache L1 cache L2 cache (shared) Memory Multi-processor System: Example • P1 reads 1000 • P1 writes 1000 • P2 reads 1000 • L2 snoops 1000 • P1 writes back 1000 • P2 gets 1000 [1000]: 6 [1000]: 6 [1000] [1000]: 6 I M S E S 10 01 11 [1000] [1000]: 6 [1000]: 5 • P2 requests for ownership with write intent

  19. The alternative: incoherent memory • As core counts grow, many argue that maintaining coherence • Will slow down the machines • Will waste a lot of energy • Will not scale • Intel SCC • Single chip cloud computer – for research purposes • 48 cores • Shared, incoherent memory • Software is responsible for correctness • The Barrelfish operating system • By Microsoft & ETH (Zurich) • Assumes no coherency as the base line

  20. Intel SCC Shared (incoherent)memory

More Related