12.4 Memory Organization in Multiprocessor Systems

12.4 Memory Organization in Multiprocessor Systems By: Melissa Jamili CS 147, Section 1 December 2, 2003

Overview • Shared Memory • Usage • Organization • Cache Coherence • Cache coherence problem • Solutions • Protocols for marking and manipulating data

Shared Memory • Two purposes • Message passing • Semaphores

Message Passing • Direct message passing without shared memory • One processor sends a message directly to another processor • Requires synchronization between processors or a buffer

Message Passing (cont.) • Message passing with shared memory • First processor writes a message to the shared memory and signals the second processor that it has a waiting message • Second processor reads the message from shared memory, possibly returning an acknowledge signal to the sender. • Location of the message in shared memory is known beforehand or sent with the waiting message signal

Semaphores • Stores information about current state • Information on protection and availability of different portions of memory • Can be accessed by any processor that needs the information

Organization of Shared Memory • Not organized into a single shared memory module • Partitioned into several memory modules

Four-processor UMA architecture with Benes network

Interleaving • Process used to divide the shared memory address space among the memory modules • Two types of interleaving • High-order • Low-order

High-order Interleaving • Shared address space is divided into contiguous blocks of equal size. • Two high-order bits of an address determine the module in which the location of the address resides. • Hence the name

Example of 64 Mb shared memory with four modules

Low-order Interleaving • Low-order bits of a memory address determine its module

Example of 64 Mb shared memory with four modules

Low-order Interleaving (cont.) • Low-order interleaving originally used to reduce delay in accessing memory • CPU could output an address and read request to one memory module • Memory module can decode and access its data • CPU could output another request to a different memory module • Results in pipelining its memory requests. • Low-order interleaving not commonly used in modern computers since cache memory

Low-order vs. High-order Interleaving • In a low-order interleaving system, consecutive memory locations reside in different memory modules • Processor executing a program stored in a contiguous block of memory would need to access different modules simultaneously • Simultaneous access possible but difficult to avoid memory conflicts

Low-order vs. High-order Interleaving (cont.) • In a high-order interleaving system, memory conflicts are easily avoided • Each processor executes a different program • Programs stored in separate memory modules • Interconnection network is set to connect each processor to its proper memory module

Cache Coherence • Retain consistency • Like cache memory in uniprocessors, cache memory in multiprocessors improve performance by reducing the time needed to access data from memory • Unlike uniprocessors, multiprocessors have individual caches for each processor

Cache Coherence Problem • Occurs when two or more caches hold the value of the same memory location simultaneously • One processor stores a value to that location in its cache • Other cache will have an invalid value in its location • Write-through cache will not resolve this problem • Updates main memory but not other caches

Cache coherence problem with four processors using a write-back cache

Solutions to the Cache Coherence Problem • Mark all shared data as non-cacheable • Use a cache directory • Use cache snooping

Non-Cacheable • Mark all shared data as non-cacheable • Forces accesses of data to be from shared memory • Lowers cache hit ratio and reduces overall system performance

Cache Directory • Use a cache directory • Directory controller is integrated with the main memory controller to maintain the cache directory • Cache directory located in main memory • Contains information on the contents of local caches • Cache writes sent to directory controller to update cache directory • Controller invalidates other caches with same data

Cache Snooping • Each cache (snoopy cache) monitors memory activity on the system bus • Appropriate action is taken when a memory request is encountered

Protocols for marking and manipulating data • MESI protocol most common • Each cache entry can be in one of the following states: • Modified: Cache contains memory value, which is different from value in shared memory • Exclusive: Only one cache contains memory value, which is same value in shared memory • Shared: Cache contains memory value corresponding to shared memory, other caches can hold this memory location • Invalid: Cache does not contain memory location

How the MESI Protocol Works • Four possible memory access scenarios: • Read hit • Read miss • Write hit • Write miss

MESI Protocol (cont.) • Read hit • Processor reads data • State unchanged

MESI Protocol (cont.) • Read miss • Processor sends read request to shared memory via system bus • No cache contains data • MMU loads data from main memory into processor’s cache • Cache marked as E (exclusive) • One cache contains data, marked as E • Data loaded into cache, marked as S (shared) • Other cache changes from state E to S • More than one cache contains the data, marked as S • Data loaded into cache, marked as S • Other cache states with data remain unchanged • One cache contains data, marked as M (modified) • Cache with modified data temporarily blocks memory read request and updates main memory • Read request continues, both caches mark data as S

MESI Protocol (cont.) • Write hit • Cache contains data in state M or E • Processor writes data to cache • State becomes M • Cache contains data in state S • Processor writes data, marked as M • All other caches mark this data as I (invalid)

MESI Protocol (cont.) • Write miss • Begins by issuing a read with intent to modify (RWITM) • No cache holds data, one cache holds data marked as E, or one or more caches hold data marked S • Data loaded from main memory into cache, marked as M • Processor writes new data to cache • Caches holding this data change states to I • One other cache holds data as M • Cache temporarily blocks request and writes its value back to main memory, marks data as I • Original cache loads data, marked as M • Processor writes new value to cache

Four-processor system using cache snooping and the MESI protocol

Conclusion • Shared memory • Message passing • Semaphores • Interleaving • Cache coherence • Cache coherence problem • Solutions • Non-cacheable • Cache directory • Cache snooping • MESI protocol

12.4 Memory Organization in Multiprocessor Systems

12.4 Memory Organization in Multiprocessor Systems

Presentation Transcript

Multiprocessor Systems

Memory Organization

Memory Organization

Memory Organization

Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems

Memory: Organization

Lecture 7. Multiprocessor and Memory Coherence

Memory System Performance in a NUMA Multicore Multiprocessor

Memory Organization

Memory Organization

Memory System Performance in a NUMA Multicore Multiprocessor

Caching in multiprocessor systems

MEMORY ORGANIZATION

Balancing Power Consumption in Multiprocessor Systems

Memory Organization

Multiprocessor Systems

Memory Organization

Memory organization

Memory Organization

MEMORY ORGANIZATION