Cache coherence protocols in shared memory multiprocessors l.jpg
Sponsored Links
This presentation is the property of its rightful owner.
1 / 30

Cache Coherence Protocols in Shared Memory Multiprocessors PowerPoint PPT Presentation

  • Updated On :
  • Presentation posted in: General

Cache Coherence Protocols in Shared Memory Multiprocessors. Mehmet Şenvar. Outline. Introduction Background Information The cache coherence problem Cahce Enforcement Strategies Consistency models Simple Solutions Hardware Protocols Snooping protocols Directory-based protocols

Related searches for Cache Coherence Protocols in Shared Memory Multiprocessors

Download Presentation

Cache Coherence Protocols in Shared Memory Multiprocessors

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Cache Coherence Protocols in Shared Memory Multiprocessors

Mehmet Şenvar

Cache Coherence Protocols


  • Introduction

  • Background Information

    • The cache coherence problem

    • Cahce Enforcement Strategies

    • Consistency models

  • Simple Solutions

  • Hardware Protocols

    • Snooping protocols

    • Directory-based protocols

  • Compiler and Software protocols

  • Future work and conclusions

Cache Coherence Protocols

The Cache Coherence Problem

  • Caches allow greater performance by storing frequently used data in faster memory

  • Since all processors share the same address space, it is possible for more than one processor to cache an address (or data item) at a time

  • If one processor updates the data item without informing the other processor, inconsistencies may result and cause incorrect executions

Cache Coherence Protocols

Cache Coherence Problem

Cache Coherence Protocols

Cache Coherence (cont.)

  • For correct execution, coherence must be enforced between the caches

  • Two major factors are:

    • performance

    • implementation cost

  • Four primary design issues are:

    • coherence detection strategy

    • coherence enforcement strategy

    • precision of block-sharing information

    • cache block size

Cache Coherence Protocols

Cache Enforcement Strategies

  • A cache enforcement strategy is the mechanism which makes caches consistent

    • write-update (WU)

    • write-invalidate (WI)

    • hybrid protocols, competitive-update (CU)

  • Performance of WU and WI vary depending on the application and the number of writes

  • Hybrid protocols switch between WU and WI based on the # of writes to a block

Cache Coherence Protocols

Consistency Models

  • A consistency model defines how the consistency of data values is maintained

  • Some consistency models are:

    • sequential consistency

    • weak consistency

    • release consistency

  • Weak consistency models are more efficient to implement and require fewer coherence messages

Cache Coherence Protocols

Shared Caches (1)

Processors share a single cache, essentially punting

the problem.

• Useful for very small machines.

• E.g., DPC in the Encore, Alliant FX/8.

• Problems are limited cache bandwidth and cache interference

• Benefits are fine-grain sharing and prefetch effects

Cache Coherence Protocols

Non-cacheable Items (2)

  • Make shared data non-cacheable

  • One of the simplest software solution

  • Also at hardware, make cache locations unreachable

Cache Coherence Protocols

Broadcast Writes (3)

  • Every cache write request is sent to all other caches

  • Firstly need to discover whether each cache hold this data

  • Other copies are either updated or invalidated

  • Significant additional memory transactions occur

Cache Coherence Protocols

Hardware Protocols

  • Snoop Bus Mechanism

  • Directory Based Methods

    • Full Directory

    • Limited Directory

    • Chained Directory

Cache Coherence Protocols

Snoop Bus Protocol

  • Snooping protocols rely on a shared bus between the processors for coherence

    • On a processor write, the write is passed through the cache to main memory on the bus

    • Any processor caching the address may update or invalidate its cache entry as appropriate

  • Snooping protocols do not scale well beyond 32 processors because of the shared bus

  • The choice between WU, WI, and CU is especially important to reduce communication

Cache Coherence Protocols

MESI (4-state) Invalidation Protocol

  • Each line in the cache can be in one of 4 states

    • Modifed (exclusive) : only in 1 cache, modified

    • Exclusive (unmodified) : only in 1 cache, unmodified

    • Shared (unmodified)

    • Invalid

Cache Coherence Protocols

MESI State Transition Diagram

Cache Coherence Protocols

MESI Example

Cache Coherence Protocols

Directory-Based Protocols

  • Directory-based protocols do not rely on a shared bus to exchange coherence information (use point-to-point connections)

    • more scaleable (can have hundreds of processors)

    • each processor can have its own memory

    • implement weak consistency for efficiency

Cache Coherence Protocols

Directory-Based Protocols (cont.)

  • Each node maintains a directory storing cache information and memory information

  • A processor communicates with the directory to access memory

    • if a processor requests a non-local memory page, the directory uses its information to find the page

    • Then, it uses messages to retrieve the page and insure all other processors have consistent info.

    • Since the directory maintains which processors are caching the page, it only needs to send messages to those processors

Cache Coherence Protocols

Directory-Based Protocols (cont.)

  • Designing a directory requires defining:

    • cache block granularity

    • cache controller design

    • directory structure

  • Cache block granularity is the size of the cache and the size of a cache line

    • CC-NUMA machines have a separate, smaller cache from main memory

    • COMA machines use node’s entire memory as cache for remote pages

    • Block size affects performance (false sharing)

Cache Coherence Protocols

Directory-Based Protocols (cont.)

  • Cache controller is hardware that maintains the directory and processes memory requests

    • custom hardware

    • programmable protocol processor

  • The directory structure is how the cache and memory information is organized

    • p+1-bit full directory

    • linked-list directories

    • tagged directories

Cache Coherence Protocols

Directory Models

  • Full Directory

    • Link to all caches for all shared locations

  • Limited Directory

    • To some caches having shared data, n < N

  • Chained (linked)Directory

    • To one chache, form ths cache to others, single/double link

Cache Coherence Protocols

Directory Sample (full)

Cache Coherence Protocols

Lock-Based Protocols

  • New work that promises to be more scaleable than directory protocols

  • Implements scope consistency which is similar to lazy release consistency

  • Coherence information exchanged by reading and writing notices from the lock which protects the shared memory

  • Currently, implemented in software similar to DSM, but may move to hardware if performance gains can be realized

Cache Coherence Protocols

Software Protocols

  • Software protocols enforce consistency with limited hardware support by relying either on the compiler or specialized software handlers

  • Similar to distributed shared memory (DSM) systems but at a lower level

    • sharing usually in blocks not pages

    • needs to be more efficient for better performance

    • architecture support for sharing

Cache Coherence Protocols

Classification of Software Protocols

  • Several criteria distinguish software protocols:

    • dynamism - compile-time or run-time analysis

    • selectivity - level of coherence actions

    • restrictiveness - conservative or as-needed consistency enforcement

    • adaptivity - can protocol adapt to access patterns

    • granularity - size and structure of coherence data

    • blocking - program block on which coherence is enforced

    • positioning - position of coherence instructions

    • updating - how memory is updated after a write

    • checking - how incoherence is detected

Cache Coherence Protocols

Software Coherence with Limited Hardware Support

  • Compiler must generate consistent code as no hardware coherence provided

  • Hardware maintains time tags which are updated on every write

  • On a read, compiler generates coherence reads which check time tags to insure data is consistent

  • Relies on the compiler to detect read which may be inconsistent, and the hardware must maintain these time tags

  • Using tags, it is also possible to perform dynamic self-invalidation of blocks

  • Many techniques based on using these time tags

Cache Coherence Protocols

Software Coherence with Limited Hardware Support (cont.)

  • If hardware has no time tags, Petersen and Li developed an algorithm which uses only page translation hardware and page status tables

  • Sharing information is maintained by a software handler at the page-level

  • On a page access or fault, the software handler checks the sharing information, updates page tables, and performs coherence actions

  • Slower than hardware as software handlers involve the OS and are on the critical memory access path

Cache Coherence Protocols

Enforcing Coherence by Restricting Parallelism

  • Compilers can also guarantee coherence by structuring the language to limit parallelism

    • easier to enforce coherence

    • limits the programmer and potential parallelism

    • simplifies compiler design

    • good performance can be achieved with no hardware support

  • Parallel language restrictions include:

    • doall parallel loops

    • master/slave processes

Cache Coherence Protocols

Optimizing Compilers

  • Optimizing compilers are designed to maintain coherence with limited hardware support without overly restricting the programmer

    • rely on detecting data dependencies

    • may use synchronization variables (locks, barriers)

    • can provide the hardware with hints

    • can detect when coherence is not needed

    • may have problems with dynamic sharing

    • offer good performance, but are hard to design

Cache Coherence Protocols

Future Work

  • Hardware protocols are well defined, and the directory structure is near optimal

  • Cost improvements can be obtained by mass producing cache controller chips

  • Software protocols are a good area for future research because they are also applicable at higher-levels of sharing (DSM, databases, ...)

  • Optimizing compilers need to be improved to detect data dependencies and optimize code for the parallel environment

Cache Coherence Protocols


  • Hardware protocols offer the best performance but require high hardware costs

  • Software protocols can be used when there is no hardware support with a slight performance penalty

  • Optimizing compilers can enforce coherence or provide hints to the hardware

  • A combination of hardware and compiler optimizations is the best

Cache Coherence Protocols

  • Login