1 / 13

Cache Coherence Schemes for Multiprocessors

Cache Coherence Schemes for Multiprocessors. Sivakumar M Osman Unsal. Consistency Different Directory Schemes Comparison of Directory schemes Hierarchical Directory scheme (in detail) Referred Papers:

vinson
Download Presentation

Cache Coherence Schemes for Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache Coherence Schemes for Multiprocessors Sivakumar M Osman Unsal

  2. Consistency Different Directory Schemes Comparison of Directory schemes Hierarchical Directory scheme (in detail) Referred Papers: “Directory-Based Cache Coherence in Large-Scale Multiprocessors”, David Chaiken, Craig Fields, Kiyoshi Kurihara and Anant Agarwal “A Survey of Cache Coherence Schemes for Multiprocessors”, Per Stenstrom “Cache Consistency and Sequential Consistency”, James R Goodman “LimitLess Directories: A Scalable Cache Coherence Schemes”, David Chaiken, John Kubiatowicz and Anant Agarwal “A Hierarchical Directory Scheme for Large-Scale Cache-Coherent Multiprocessors”, A Dissertation by Yeong-Chang Maa

  3. Strict Consistency Any read to memory location X returns the value stored by the most recent write operation to X P1: W(x)1 P1: W(x)1 P2: R(x)1 P2: R(x)0 R(x)1 Sequential Consistency : Program order + Memory coherence The result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified y its program P1: W(x)1 P1: W(x)1 P2: R(x)0 R(x)1 P2: R(x)1 R(x)1 CONSISTENCY

  4. Causal Consistency Writes that are potentially causally related must be seen by all process in the same order. Concurrent writes may be seen in a different order on different machines. P1: W(x)1 W(x)3 P2: R(x)1 W(x)2 P3 R(x)1 R(x)3 R(x)2 P4 R(x)1 R(x)2 R(x)3 PRAM Consistency Writes done by a single process are received by all other process in the order in which they are issued, but writes from different processes may be seen in a different order by different processes. Processor Consistency For every memory location X, there should be a global agreement about the order of writes to X CONSISTENCY

  5. Weak Consistency Using Synchronization variable which are sequentially consistent No access to a synchronization variable is allowed until all previous writes have completed everywhere No data access is allowed until all previous access to synchronization variable have been performed Release Consistency Barrier synchronization : Acquire and Release Acquire and Release should be processor consistent Lazy release and Eager release consistencies Entry Consistency Locks for each shared variable or element CONSISTENCY

  6. Need Limited Bandwidth Bus cycle times - ring out Scalability Disparity between bus and processor speed Increase in Bandwidth as processor number increases Drawback No Broadcast capability Complex protocol Directory based cache coherence

  7. Tang’s scheme Full-mapped Each directory entry N bits + status bits for N processors Memory overhead scales as (square of N) assuming M a N Censier scheme (Distributed) Stenstrom scheme (Distributed) Limited Directories Classified as Dir i X, where X may be NB or B & i<N Eviction : Pointer replacement Resembles set associative cache and requires eviction policy Efficient if memory is referenced by few processors Memory overhead scales as (M*i*log N) If X is NB, can allow more than i copies to exist Directory Schemes

  8. Directory Schemes • Chained Directories • Make use of pointers like linked lists • Complex cache-block replacement • splice intermediate cache out of the chain • Invalidate the location • Variation: Doubly linked chain • Optimizes replacement process • Needs large average message block size • Comparison of full-mapped, limited, chained schemes • Metric: Processor Utilization • Utilization depends on frequency of Memory reference and latency of memory system • Latency depends on topology, speed, number of processors, memory access latency, frequency and size of messages

  9. Directory Schemes • Analysis • No coherence : All addresses in trace are not shared. Gives upper bound • Only cache private data : For comparison with other schemes • P-Thor : minimize communication and has minimum synchronization points • Speech : Poor performance of limited directories due to pointer thrashing • Performance improvement by system level optimizations • * Tree barrier structure instead of linear barrier • * Separating read only blocks from read/write blocks • * Reducing the block size

  10. Directory Schemes • Coarse Vector DiriCVr • Initially behaves as limited directory • Switches to fully mapped • Dir0B • 2 status bit for 4 states : Absent, Present1: present and clean in only one cache, Present: present and clean in more than one cache, PresentM: present and dirty in only one cache • LimitLess Directory Scheme • Combination of hardware and software techniques • Realize performance of full-map directory • Memory overhead of limited directory • Sectored Directory DirN/L • L sub-blocks share the directory • Overhead is MN/L

  11. Directory Schemes • Directory Cache Dira1,a2 • a1 entries for short limited directory pointers • a2 entries for long full-map pointers • Hierarchical Scheme

  12. Network Architecture Wilson Hierarchical cache/bus architecture combination bus and directory scheme cache contains a copy of all blocks cached underneath it write Invalidate protocol Higher level caches act as filters Data Diffusion Machine Hierarchy of busses with large processor caches Write Invalidate protocol Only state information in higher order caches No global memory and cost effective Hierarchical Cache Coherence Schemes

  13. Hierarchical Full-mapped Directory Schemes Descendants presence vector tag bits ackctr MRU INV UP MRQ Tr dirty • States of HFMD • ABS : No entries in descendants; cleared des.vector and Tr bit • ABT : descendants entries being invalidated; cleared des.vector and Tr bit • RO : read only entries in the descendants; set des.vector, cleared dirty and Tr • bits • RW : a dirty (read write) entry is in the descendants; set des.vector, dirty bit • and cleared TR bit • RT : descendant entries have outstanding read requests; set des.vector and Tr • bit, cleared dirty bit • WT : descendant entries have outstanding write or modify request; set • des.vector, dirty bit and Tr bit • INV : descendant entries being invalidated from directory entry; cleared • des.vector, set Tr bit and INV bit

More Related