CS 2200

CS 2200 Presentation 18a Parallel Processors

Questions?

Our Road Map Processor Memory Hierarchy I/O Subsystem Parallel Systems Networking

The Next Step • Create more powerful computers simply by interconnecting many small computers • Should be scalable • Should be fault tolerant • More economical • Multiprocessors • High throughput running independent tasks • Parallel Processing • Single program on multiple processors

Key Questions • How do parallel processors share data? • How do parallel processors communicate? • How many processors?

Processor Processor Sharing Data I Communication with memory via loads and stores Memory Same box Single Address Space

Processor Processor Problems? Memory

Sharing Data I has Two Flavors! • Uniform Memory Access (UMA) • Symmetric Multiprocessors (SMP) • Non-Uniform Memory Access (NUMA)

Processor Processor Cache Cache Cache Processor Sharing Data I Uniform Memory Access - UMA Memory Symmetric Multiprocessor SMP

I/O I/O I/O I/O Channel Channel Channel Channel CPU x 4 CPU x 4 CPU x 4 CPU x 4 Cache Cache Cache Cache Memory Memory Memory Memory Sharing Data I Non-Uniform Memory Access - NUMA

Sharing Data II Computer with Private Memory Computer with Private Memory Computer with Private Memory Local Area Network • Use Message Passing • Each machine capable of • Send • Receive

Connection Schemes • Single Bus • Improved feasability due to -processors • Caches can reduce bus traffic • Need to worry about cache coherency • Network

Programming • As contrasted to instruction level parallelism which may be largely ignored by the programmer... • Writing efficient multiprocessor programs is hard. • Wizards write programs with sequential interface (e.g. Databases, file servers, CAD) • Communications overhead becomes a factor • Requires a lot of knowledge of the hardware!!!

Speedup Challenge • To get full benefit of parallelism need to be able to parallelize the entire program! • Amdahl’s Law • Timeafter = (Timeaffected/Improvement)+Timeunaffected • Example: We want 100 times speedup with 100 processors • Timeunaffected = 0!!!

Back to the Bus

Multiprocessor Cache Coherency • Means that values in cache and memory are consistent or that we know they are different and can act accordingly • Considered to be a good thing. • Becomes more difficult with multiple processors and multiple caches! • Popular technique: Snooping! • Write-invalidate • Write-update

Multi-Processor Cache Coherency

P One of many processors.

P Addr 000000 R W This indicates what operation the processor is trying to perform and with what address.

Addr 000000 R W Tag 0000 0000 0000 0000 11 10 01 00 ID V 0 0 0 0 0 D 0 0 0 0 0 0 S 0 P The processors cache: Tag (4 bits), 4 lines (ID), Valid, dirty and Shared bits.

Addr 000000 R W Tag 0000 0000 0000 0000 ID 00 01 10 11 0 0 V 0 0 0 0 0 D 0 0 S 0 0 0 P Note: For this somewhat simplified example we won’t concern ourselves with how many bytes (or words) are in each line. Assume that it’s more than one.

Addr 000000 R W 0000 0000 0000 Tag 0000 11 ID 00 01 10 0 0 0 V 0 0 0 0 0 D S 0 0 0 0 P The Bus with indication of address and operation. Addr 000000 R W

Addr 000000 R W 0000 0000 0000 Tag 0000 11 ID 00 01 10 0 0 0 V 0 0 0 0 0 D S 0 0 0 0 P These bus operations are coming from other processors which aren’t shown. Addr 000000 R W

Addr 000000 R W Tag 0000 0000 0000 0000 ID 00 11 01 10 0 V 0 0 0 D 0 0 0 0 0 0 S 0 0 P Addr 000000 R W MEMORY Main Memory

Tag 0000 0000 0000 0000 01 00 10 ID 11 0 0 0 0 V D 0 0 0 0 0 0 0 0 S P Processor issues a read Addr 101010 R W Addr 000000 R W MEMORY

P Cache reports... Addr 101010 R W MISS Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 0000 10 0 0 0 Addr 0000 11 0 0 0 000000 R W MEMORY

P Cache reports... Addr 101010 R W MISS Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 0000 10 0 0 0 Addr 0000 11 0 0 0 000000 R W Because the tags don’t match! MEMORY

P Data read from memory Addr 101010 R W Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY

P Data read from memory Addr 101010 R W Tag ID V D S 0000 00 0 0 0 This bit indicates that this line is “shared” which means other caches might have the same value. 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY

P From now on we will show these as 2 step operations…step 1 the request. Addr 101010 R W Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 0000 10 0 0 0 Addr 0000 11 0 0 0 000000 R W MEMORY

P Step 2…what was the result and the change to the cache. Addr 101010 R W MISS Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY

P A write... Addr 111100 R W Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Addr 111100 R W Write Miss Tag ID V D S Addr 000000 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Keep in mind that since most cache configurations have multiple bytes per line a write miss will actually require us to get the line from memory into the cache first since we are only writing one byte into the line. Addr 111100 R W Write Miss Tag ID V D S Addr 000000 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Note: The dirty bit signifies that the data in the cache is not the same as in memory. Addr 111100 R W Tag ID V D S Addr 000000 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Another read... Addr 101010 R W Tag ID V D S Addr 000000 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P …this time a hit! Addr 101010 R W Tag ID V D S HIT! Addr 000000 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Now another write... Addr 111100 R W Tag ID V D S Addr 000000 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P To a dirty line! Addr 111100 R W Tag ID V D S This is a write hit and since the shared bit is 0 we know we are in the exclusive state. Addr 000000 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Now another processor failing to find what it needs in its cache goes to the bus…a “bus read miss” Addr 000000 R W Tag ID V D S Addr 010101 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Our cache which is monitoring the bus or snooping sees the miss but can’t help. Addr 000000 R W Tag ID V D S Addr 010101 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Another bus request... Addr 000000 R W Tag ID V D S Addr 101010 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Since we have this value in our cache we can satisfy the request from our cache assuming that this will be quicker than from memory. Addr 000000 R W Tag ID V D S Addr 101010 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P And another request. This time to a dirty line. Addr 000000 R W Tag ID V D S Addr 111100 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P We have to supply the value out of our cache since it is more current than the value in memory. Addr 000000 R W Tag ID V D S Addr 111100 R W MEMORY

1111 00 1 1 1 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Addr 000000 R W We also mark it as shared. Why? Tag ID V D S Addr 000000 R W MEMORY

1111 00 1 1 1 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Addr 111100 R W If, for example, our next operation was a write to this line... Tag ID V D S Addr 000000 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P We would have to note that it was again exclusive and let the other caches know Addr 111100 R W ZAP Tag ID V D S Addr 000000 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P We could then write repeatedly to this line and since we have exclusive ownership no one has to know! Addr 111100 R W Tag ID V D S Addr 000000 R W MEMORY

1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P In a similar way we must respond to write misses by other caches. Addr 000000 R W Tag ID V D S Addr 101010 R W MEMORY

CS 2200

CS 2200

Presentation Transcript

CS 2200

POSC 2200 – New Challenges

BCOR 2200 Chapter 1

BCOR 2200 Chapter 3

BCOR 2200 Chapter 5

BCOR 2200 Chapter 6

EDEL 2200

BCOR 2200 Chapter 10

Art and Design 2200

BCOR 2200 Chapter 11

BCOR 2200 Chapter 4

V-2200 Test Platform

TESL 2200 Lecture 3

POSC 2200 – Theoretical Approaches

136!2200

POSC 2200 – Theoretical Approaches

POSC 2200 – International Organizations

POSC 2200 – Modern Conflict

Introducing the MTOOP 2200

HIM 2200

POSC 2200 – The Individual

POSC 2200 - Introduction