Cache coherence protocols in shared memory multiprocessors
1 / 30

Cache Coherence Protocols in Shared Memory Multiprocessors - PowerPoint PPT Presentation

  • Updated On :

Cache Coherence Protocols in Shared Memory Multiprocessors. Mehmet Şenvar. Outline. Introduction Background Information The cache coherence problem Cahce Enforcement Strategies Consistency models Simple Solutions Hardware Protocols Snooping protocols Directory-based protocols

Related searches for Cache Coherence Protocols in Shared Memory Multiprocessors

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Cache Coherence Protocols in Shared Memory Multiprocessors' - hao

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Cache coherence protocols in shared memory multiprocessors l.jpg

Cache Coherence Protocols in Shared Memory Multiprocessors

Mehmet Şenvar

Cache Coherence Protocols

Outline l.jpg

  • Introduction

  • Background Information

    • The cache coherence problem

    • Cahce Enforcement Strategies

    • Consistency models

  • Simple Solutions

  • Hardware Protocols

    • Snooping protocols

    • Directory-based protocols

  • Compiler and Software protocols

  • Future work and conclusions

Cache Coherence Protocols

The cache coherence problem l.jpg
The Cache Coherence Problem

  • Caches allow greater performance by storing frequently used data in faster memory

  • Since all processors share the same address space, it is possible for more than one processor to cache an address (or data item) at a time

  • If one processor updates the data item without informing the other processor, inconsistencies may result and cause incorrect executions

Cache Coherence Protocols

Cache coherence problem l.jpg
Cache Coherence Problem

Cache Coherence Protocols

Cache coherence cont l.jpg
Cache Coherence (cont.)

  • For correct execution, coherence must be enforced between the caches

  • Two major factors are:

    • performance

    • implementation cost

  • Four primary design issues are:

    • coherence detection strategy

    • coherence enforcement strategy

    • precision of block-sharing information

    • cache block size

Cache Coherence Protocols

Cache enforcement strategies l.jpg
Cache Enforcement Strategies

  • A cache enforcement strategy is the mechanism which makes caches consistent

    • write-update (WU)

    • write-invalidate (WI)

    • hybrid protocols, competitive-update (CU)

  • Performance of WU and WI vary depending on the application and the number of writes

  • Hybrid protocols switch between WU and WI based on the # of writes to a block

Cache Coherence Protocols

Consistency models l.jpg
Consistency Models

  • A consistency model defines how the consistency of data values is maintained

  • Some consistency models are:

    • sequential consistency

    • weak consistency

    • release consistency

  • Weak consistency models are more efficient to implement and require fewer coherence messages

Cache Coherence Protocols

Shared caches 1 l.jpg
Shared Caches (1)

Processors share a single cache, essentially punting

the problem.

• Useful for very small machines.

• E.g., DPC in the Encore, Alliant FX/8.

• Problems are limited cache bandwidth and cache interference

• Benefits are fine-grain sharing and prefetch effects

Cache Coherence Protocols

Non cacheable items 2 l.jpg
Non-cacheable Items (2)

  • Make shared data non-cacheable

  • One of the simplest software solution

  • Also at hardware, make cache locations unreachable

Cache Coherence Protocols

Broadcast writes 3 l.jpg
Broadcast Writes (3)

  • Every cache write request is sent to all other caches

  • Firstly need to discover whether each cache hold this data

  • Other copies are either updated or invalidated

  • Significant additional memory transactions occur

Cache Coherence Protocols

Hardware protocols l.jpg
Hardware Protocols

  • Snoop Bus Mechanism

  • Directory Based Methods

    • Full Directory

    • Limited Directory

    • Chained Directory

Cache Coherence Protocols

Snoop bus protocol l.jpg
Snoop Bus Protocol

  • Snooping protocols rely on a shared bus between the processors for coherence

    • On a processor write, the write is passed through the cache to main memory on the bus

    • Any processor caching the address may update or invalidate its cache entry as appropriate

  • Snooping protocols do not scale well beyond 32 processors because of the shared bus

  • The choice between WU, WI, and CU is especially important to reduce communication

Cache Coherence Protocols

Mesi 4 state invalidation protocol l.jpg
MESI (4-state) Invalidation Protocol

  • Each line in the cache can be in one of 4 states

    • Modifed (exclusive) : only in 1 cache, modified

    • Exclusive (unmodified) : only in 1 cache, unmodified

    • Shared (unmodified)

    • Invalid

Cache Coherence Protocols

Mesi state transition diagram l.jpg
MESI State Transition Diagram

Cache Coherence Protocols

Mesi example l.jpg
MESI Example

Cache Coherence Protocols

Directory based protocols l.jpg
Directory-Based Protocols

  • Directory-based protocols do not rely on a shared bus to exchange coherence information (use point-to-point connections)

    • more scaleable (can have hundreds of processors)

    • each processor can have its own memory

    • implement weak consistency for efficiency

Cache Coherence Protocols

Directory based protocols cont l.jpg
Directory-Based Protocols (cont.)

  • Each node maintains a directory storing cache information and memory information

  • A processor communicates with the directory to access memory

    • if a processor requests a non-local memory page, the directory uses its information to find the page

    • Then, it uses messages to retrieve the page and insure all other processors have consistent info.

    • Since the directory maintains which processors are caching the page, it only needs to send messages to those processors

Cache Coherence Protocols

Directory based protocols cont18 l.jpg
Directory-Based Protocols (cont.)

  • Designing a directory requires defining:

    • cache block granularity

    • cache controller design

    • directory structure

  • Cache block granularity is the size of the cache and the size of a cache line

    • CC-NUMA machines have a separate, smaller cache from main memory

    • COMA machines use node’s entire memory as cache for remote pages

    • Block size affects performance (false sharing)

Cache Coherence Protocols

Directory based protocols cont19 l.jpg
Directory-Based Protocols (cont.)

  • Cache controller is hardware that maintains the directory and processes memory requests

    • custom hardware

    • programmable protocol processor

  • The directory structure is how the cache and memory information is organized

    • p+1-bit full directory

    • linked-list directories

    • tagged directories

Cache Coherence Protocols

Directory models l.jpg
Directory Models

  • Full Directory

    • Link to all caches for all shared locations

  • Limited Directory

    • To some caches having shared data, n < N

  • Chained (linked)Directory

    • To one chache, form ths cache to others, single/double link

Cache Coherence Protocols

Directory sample full l.jpg
Directory Sample (full)

Cache Coherence Protocols

Lock based protocols l.jpg
Lock-Based Protocols

  • New work that promises to be more scaleable than directory protocols

  • Implements scope consistency which is similar to lazy release consistency

  • Coherence information exchanged by reading and writing notices from the lock which protects the shared memory

  • Currently, implemented in software similar to DSM, but may move to hardware if performance gains can be realized

Cache Coherence Protocols

Software protocols l.jpg
Software Protocols

  • Software protocols enforce consistency with limited hardware support by relying either on the compiler or specialized software handlers

  • Similar to distributed shared memory (DSM) systems but at a lower level

    • sharing usually in blocks not pages

    • needs to be more efficient for better performance

    • architecture support for sharing

Cache Coherence Protocols

Classification of software protocols l.jpg
Classification of Software Protocols

  • Several criteria distinguish software protocols:

    • dynamism - compile-time or run-time analysis

    • selectivity - level of coherence actions

    • restrictiveness - conservative or as-needed consistency enforcement

    • adaptivity - can protocol adapt to access patterns

    • granularity - size and structure of coherence data

    • blocking - program block on which coherence is enforced

    • positioning - position of coherence instructions

    • updating - how memory is updated after a write

    • checking - how incoherence is detected

Cache Coherence Protocols

Software coherence with limited hardware support l.jpg
Software Coherence with Limited Hardware Support

  • Compiler must generate consistent code as no hardware coherence provided

  • Hardware maintains time tags which are updated on every write

  • On a read, compiler generates coherence reads which check time tags to insure data is consistent

  • Relies on the compiler to detect read which may be inconsistent, and the hardware must maintain these time tags

  • Using tags, it is also possible to perform dynamic self-invalidation of blocks

  • Many techniques based on using these time tags

Cache Coherence Protocols

Software coherence with limited hardware support cont l.jpg
Software Coherence with Limited Hardware Support (cont.)

  • If hardware has no time tags, Petersen and Li developed an algorithm which uses only page translation hardware and page status tables

  • Sharing information is maintained by a software handler at the page-level

  • On a page access or fault, the software handler checks the sharing information, updates page tables, and performs coherence actions

  • Slower than hardware as software handlers involve the OS and are on the critical memory access path

Cache Coherence Protocols

Enforcing coherence by restricting parallelism l.jpg
Enforcing Coherence by Restricting Parallelism

  • Compilers can also guarantee coherence by structuring the language to limit parallelism

    • easier to enforce coherence

    • limits the programmer and potential parallelism

    • simplifies compiler design

    • good performance can be achieved with no hardware support

  • Parallel language restrictions include:

    • doall parallel loops

    • master/slave processes

Cache Coherence Protocols

Optimizing compilers l.jpg
Optimizing Compilers

  • Optimizing compilers are designed to maintain coherence with limited hardware support without overly restricting the programmer

    • rely on detecting data dependencies

    • may use synchronization variables (locks, barriers)

    • can provide the hardware with hints

    • can detect when coherence is not needed

    • may have problems with dynamic sharing

    • offer good performance, but are hard to design

Cache Coherence Protocols

Future work l.jpg
Future Work

  • Hardware protocols are well defined, and the directory structure is near optimal

  • Cost improvements can be obtained by mass producing cache controller chips

  • Software protocols are a good area for future research because they are also applicable at higher-levels of sharing (DSM, databases, ...)

  • Optimizing compilers need to be improved to detect data dependencies and optimize code for the parallel environment

Cache Coherence Protocols

Conclusions l.jpg

  • Hardware protocols offer the best performance but require high hardware costs

  • Software protocols can be used when there is no hardware support with a slight performance penalty

  • Optimizing compilers can enforce coherence or provide hints to the hardware

  • A combination of hardware and compiler optimizations is the best

Cache Coherence Protocols