1 / 71

CS 2200

CS 2200. Presentation 18a Parallel Processors. Questions?. Our Road Map. Processor. Memory Hierarchy. I/O Subsystem. Parallel Systems. Networking. The Next Step. Create more powerful computers simply by interconnecting many small computers Should be scalable Should be fault tolerant

base
Download Presentation

CS 2200

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 2200 Presentation 18a Parallel Processors

  2. Questions?

  3. Our Road Map Processor Memory Hierarchy I/O Subsystem Parallel Systems Networking

  4. The Next Step • Create more powerful computers simply by interconnecting many small computers • Should be scalable • Should be fault tolerant • More economical • Multiprocessors • High throughput running independent tasks • Parallel Processing • Single program on multiple processors

  5. Key Questions • How do parallel processors share data? • How do parallel processors communicate? • How many processors?

  6. Processor Processor Sharing Data I Communication with memory via loads and stores Memory Same box Single Address Space

  7. Processor Processor Problems? Memory

  8. Sharing Data I has Two Flavors! • Uniform Memory Access (UMA) • Symmetric Multiprocessors (SMP) • Non-Uniform Memory Access (NUMA)

  9. Processor Processor Cache Cache Cache Processor Sharing Data I Uniform Memory Access - UMA Memory Symmetric Multiprocessor SMP

  10. I/O I/O I/O I/O Channel Channel Channel Channel CPU x 4 CPU x 4 CPU x 4 CPU x 4 Cache Cache Cache Cache Memory Memory Memory Memory Sharing Data I Non-Uniform Memory Access - NUMA

  11. Sharing Data II Computer with Private Memory Computer with Private Memory Computer with Private Memory Local Area Network • Use Message Passing • Each machine capable of • Send • Receive

  12. Connection Schemes • Single Bus • Improved feasability due to -processors • Caches can reduce bus traffic • Need to worry about cache coherency • Network

  13. Programming • As contrasted to instruction level parallelism which may be largely ignored by the programmer... • Writing efficient multiprocessor programs is hard. • Wizards write programs with sequential interface (e.g. Databases, file servers, CAD) • Communications overhead becomes a factor • Requires a lot of knowledge of the hardware!!!

  14. Speedup Challenge • To get full benefit of parallelism need to be able to parallelize the entire program! • Amdahl’s Law • Timeafter = (Timeaffected/Improvement)+Timeunaffected • Example: We want 100 times speedup with 100 processors • Timeunaffected = 0!!!

  15. Back to the Bus

  16. Multiprocessor Cache Coherency • Means that values in cache and memory are consistent or that we know they are different and can act accordingly • Considered to be a good thing. • Becomes more difficult with multiple processors and multiple caches! • Popular technique: Snooping! • Write-invalidate • Write-update

  17. Multi-Processor Cache Coherency

  18. P One of many processors.

  19. P Addr 000000 R W This indicates what operation the processor is trying to perform and with what address.

  20. Addr 000000 R W Tag 0000 0000 0000 0000 11 10 01 00 ID V 0 0 0 0 0 D 0 0 0 0 0 0 S 0 P The processors cache: Tag (4 bits), 4 lines (ID), Valid, dirty and Shared bits.

  21. Addr 000000 R W Tag 0000 0000 0000 0000 ID 00 01 10 11 0 0 V 0 0 0 0 0 D 0 0 S 0 0 0 P Note: For this somewhat simplified example we won’t concern ourselves with how many bytes (or words) are in each line. Assume that it’s more than one.

  22. Addr 000000 R W 0000 0000 0000 Tag 0000 11 ID 00 01 10 0 0 0 V 0 0 0 0 0 D S 0 0 0 0 P The Bus with indication of address and operation. Addr 000000 R W

  23. Addr 000000 R W 0000 0000 0000 Tag 0000 11 ID 00 01 10 0 0 0 V 0 0 0 0 0 D S 0 0 0 0 P These bus operations are coming from other processors which aren’t shown. Addr 000000 R W

  24. Addr 000000 R W Tag 0000 0000 0000 0000 ID 00 11 01 10 0 V 0 0 0 D 0 0 0 0 0 0 S 0 0 P Addr 000000 R W MEMORY Main Memory

  25. Tag 0000 0000 0000 0000 01 00 10 ID 11 0 0 0 0 V D 0 0 0 0 0 0 0 0 S P Processor issues a read Addr 101010 R W Addr 000000 R W MEMORY

  26. P Cache reports... Addr 101010 R W MISS Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 0000 10 0 0 0 Addr 0000 11 0 0 0 000000 R W MEMORY

  27. P Cache reports... Addr 101010 R W MISS Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 0000 10 0 0 0 Addr 0000 11 0 0 0 000000 R W Because the tags don’t match! MEMORY

  28. P Data read from memory Addr 101010 R W Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY

  29. P Data read from memory Addr 101010 R W Tag ID V D S 0000 00 0 0 0 This bit indicates that this line is “shared” which means other caches might have the same value. 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY

  30. P From now on we will show these as 2 step operations…step 1 the request. Addr 101010 R W Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 0000 10 0 0 0 Addr 0000 11 0 0 0 000000 R W MEMORY

  31. P Step 2…what was the result and the change to the cache. Addr 101010 R W MISS Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY

  32. P A write... Addr 111100 R W Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY

  33. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Addr 111100 R W Write Miss Tag ID V D S Addr 000000 R W MEMORY

  34. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Keep in mind that since most cache configurations have multiple bytes per line a write miss will actually require us to get the line from memory into the cache first since we are only writing one byte into the line. Addr 111100 R W Write Miss Tag ID V D S Addr 000000 R W MEMORY

  35. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Note: The dirty bit signifies that the data in the cache is not the same as in memory. Addr 111100 R W Tag ID V D S Addr 000000 R W MEMORY

  36. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Another read... Addr 101010 R W Tag ID V D S Addr 000000 R W MEMORY

  37. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P …this time a hit! Addr 101010 R W Tag ID V D S HIT! Addr 000000 R W MEMORY

  38. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Now another write... Addr 111100 R W Tag ID V D S Addr 000000 R W MEMORY

  39. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P To a dirty line! Addr 111100 R W Tag ID V D S This is a write hit and since the shared bit is 0 we know we are in the exclusive state. Addr 000000 R W MEMORY

  40. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Now another processor failing to find what it needs in its cache goes to the bus…a “bus read miss” Addr 000000 R W Tag ID V D S Addr 010101 R W MEMORY

  41. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Our cache which is monitoring the bus or snooping sees the miss but can’t help. Addr 000000 R W Tag ID V D S Addr 010101 R W MEMORY

  42. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Another bus request... Addr 000000 R W Tag ID V D S Addr 101010 R W MEMORY

  43. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Since we have this value in our cache we can satisfy the request from our cache assuming that this will be quicker than from memory. Addr 000000 R W Tag ID V D S Addr 101010 R W MEMORY

  44. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P And another request. This time to a dirty line. Addr 000000 R W Tag ID V D S Addr 111100 R W MEMORY

  45. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P We have to supply the value out of our cache since it is more current than the value in memory. Addr 000000 R W Tag ID V D S Addr 111100 R W MEMORY

  46. 1111 00 1 1 1 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Addr 000000 R W We also mark it as shared. Why? Tag ID V D S Addr 000000 R W MEMORY

  47. 1111 00 1 1 1 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Addr 111100 R W If, for example, our next operation was a write to this line... Tag ID V D S Addr 000000 R W MEMORY

  48. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P We would have to note that it was again exclusive and let the other caches know Addr 111100 R W ZAP Tag ID V D S Addr 000000 R W MEMORY

  49. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P We could then write repeatedly to this line and since we have exclusive ownership no one has to know! Addr 111100 R W Tag ID V D S Addr 000000 R W MEMORY

  50. 1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P In a similar way we must respond to write misses by other caches. Addr 000000 R W Tag ID V D S Addr 101010 R W MEMORY

More Related