William Stallings Computer Organization and Architecture 5 th Edition - PowerPoint PPT Presentation

william stallings computer organization and architecture 5 th edition n.
Skip this Video
Loading SlideShow in 5 Seconds..
William Stallings Computer Organization and Architecture 5 th Edition PowerPoint Presentation
Download Presentation
William Stallings Computer Organization and Architecture 5 th Edition

play fullscreen
1 / 72
William Stallings Computer Organization and Architecture 5 th Edition
Download Presentation
Download Presentation

William Stallings Computer Organization and Architecture 5 th Edition

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. William Stallings Computer Organization and Architecture5th Edition Chapter 16 Parallel Processing Traditionally, the computer has been viewed as a sequential machine. But this view has never been entirely true.

  2. Multiple Processor Organization • Flynn categories: • Single instruction, single data stream – SISD • Single instruction, multiple data stream - SIMD • Multiple instruction, single data stream - MISD • Multiple instruction, multiple data stream- MIMD

  3. Single Instruction, Single Data Stream - SISD • Single processor • Single instruction stream • Data stored in single memory • Uni-processor

  4. Single Instruction, Multiple Data Stream - SIMD • Single machine instruction • Controls simultaneous execution • Number of processing elements • Lockstep basis • Each processing element has associated data memory • Each instruction executed on different set of data by different processors • Vector and array processors

  5. Multiple Instruction, Single Data Stream - MISD • Sequence of data • Transmitted to set of processors • Each processor executes different instruction sequence • Never been implemented • New material • Network processors (Intel IXP1200)

  6. Multiple Instruction, Multiple Data Stream- MIMD • Set of processors • Simultaneously execute different instruction sequences • Different sets of data • SMPs(symmetric multiprocessor), clusters and NUMA systems

  7. Taxonomy of Parallel Processor Architectures

  8. MIMD - Overview • General purpose processors • Each can process all instructions necessary • Further classified by method of processor communication

  9. Architectures of MIMD Multiprocessors • The first group • was called centralized shared-memory architectures • with small processor counts • share a single centralized memory • interconnect by bus • uniform access time from each processor • called UMAs • Uniform Memory Access • (均匀存储器存取)

  10. Architectures of MIMD Multiprocessors • Symmetric Multi-Processor (SMP) centralized shared-memory architectures

  11. Tightly Coupled - SMP • Processors share memory • Communicate via that shared memory • Symmetric Multiprocessor (SMP) • Share single memory or pool • Shared bus to access memory • Memory access time to given area of memory is approximately the same for each processor

  12. Architectures of MIMD Multiprocessors • The second group • support larger processor counts • share distributed memory • interconnect by interconnection network • a non-uniform access time from each processor • also called NUMAs • Non-Uniform Memory Access • (非均匀存储器存取)

  13. Architectures of MIMD Multiprocessors • A Distributed Memory Multi-Processor

  14. Tightly Coupled - NUMA • Nonuniform memory access • Access times to different regions of memory may differ

  15. Loosely Coupled - Clusters • Collection of independent uniprocessors or SMPs • Interconnected to form a cluster • Communication via fixed path or network connections

  16. Parallel Organizations - SISD

  17. Parallel Organizations - SIMD

  18. Parallel Organizations -MISD IS1 DS CU1 PU1 MM IS2 CU2 PU2 ISn DS CUn PUn

  19. Parallel Organizations - MIMD Shared Memory

  20. Parallel Organizations - MIMDDistributed Memory

  21. Symmetric Multiprocessors • A standalone computer with the following characteristics • Two or more similar processors of comparable capacity • Processors share same memory and I/O • Processors are connected by a bus or other internal connection • Memory access time is approximately the same for each processor

  22. Symmetric Multiprocessors • A standalone computer with the following characteristics • All processors share access to I/O • Either through same channels or different channels giving paths to same devices • All processors can perform the same functions (hence symmetric) • System controlled by integrated operating system • providing interaction between processors • Interaction at job, task, file and data element levels

  23. SMP Advantages • Performance • If some work can be done in parallel • Availability • Since all processors can perform the same functions, failure of a single processor does not halt the system • Incremental growth • User can enhance performance by adding additional processors • Scaling • Vendors can offer range of products based on number of processors

  24. Block Diagram of Tightly Coupled Multiprocessor

  25. Organization Classification • Time shared or common bus • Multiport memory • Central control unit

  26. Shared Bus

  27. Time Shared Bus • Simplest form • Structure and interface similar to single processor system • Following features provided • Addressing - distinguish modules on bus • Arbitration - any module can be temporary master • Time sharing - if one module has the bus, others must wait and may have to suspend • Now have multiple processors as well as multiple I/O modules

  28. Time Share Bus - Advantages • Simplicity • The simplest approach to multiprocessor organization • Flexibility • Easy to expend system by attaching more processors to the bus • Reliability • The failure of any device not cause failure of system

  29. Time Share Bus - Disadvantage • Performance limited by bus cycle time • All memory references pass through the common bus • Each processor should have local cache • Reduce number of bus accesses • Leads to problems with cache coherence • Solved in hardware - see later

  30. Multiport Memory Diagram

  31. Multiport Memory • Direct independent access of memory modules by each processor • Logic required to resolve conflicts • Little or no modification to processors or modules required

  32. Multiport Memory - Advantages and Disadvantages • More complex • Extra logic in memory system • Better performance • Each processor has dedicated path to each module • Can configure portions of memory as private to one or more processors • Increased security • Write through cache policy

  33. Central Control Unit • Funnels separate data streams between independent modules • Can buffer requests • Performs arbitration and timing • Pass status and control • Perform cache update alerting • Interfaces to modules remain the same • e.g. IBM S/370 • It is rarely seen today

  34. Operating System Issues • Simultaneous concurrent processes • Allow several processors to execute same IS code • Scheduling • Assign ready processes to available processors • Synchronization • Enforces mutual exclusion (互斥)and event ordering • Memory management • Paging mechanisms on different processors • Reliability and fault tolerance • Provide graceful degradation(功能退化)

  35. IBM S/390 Mainframe SMP

  36. S/390 - Key components • Processor unit (PU) • CISC microprocessor • Frequently used instructions hard wired • 64k L1 unified cache with 1 cycle access time • L2 cache • 384k • Bus switching network adapter (BSN) • Includes 2M of L3 cache • Memory card • 8G per card

  37. Switched Interconnection • S/390 copes with single bus bottleneck problem in two ways • Main memory is split into four separate cards with its own storage controller • The connection from processors to a single memory card is point-to-point links • BSN connects four physical links to one logical data bus • An incoming signal is echoed back to others

  38. Cache Coherence and MESI Protocol • Problem - multiple copies of same data in different caches • Can result in an inconsistent view of memory • Write back policy can lead to inconsistency • Write through can also give problems unless caches monitor memory traffic

  39. Cache Coherence and MESI Protocol P1 P2 P1 P2 P1 P2 x x x’ x x’ x x x’ x Original Write through Write back

  40. Software Solutions • Compiler and operating system deal with problem • Overhead transferred to compile time • Design complexity transferred from hardware to software • However, software tends to make conservative decisions • Inefficient cache utilization • Analyze code to determine safe periods for caching shared variables

  41. Hardware Solution • Cache coherence protocols • Dynamic recognition of potential problems • Run time • More efficient use of cache • Transparent to programmer • Directory protocols • Snoopy protocols

  42. Directory Protocols • Collect and maintain information about copies of data in cache • Directory stored in main memory P1 P2 P3 C C C D M

  43. Directory Protocols • Requests are checked against directory • Appropriate transfers are performed • Creates central bottleneck • Effective in large scale systems with complex interconnection schemes

  44. Snoopy Protocols • Distribute cache coherence responsibility among cache controllers • Cache recognizes that a line is shared • Updates announced to other caches • Suited to bus based multiprocessor • Increases bus traffic

  45. Write Invalidate • Multiple readers, one writer • When a write is required, all other caches of the line are invalidated • Writing processor then has exclusive access until line required by another processor • Used in Pentium II and PowerPC systems • State of every line is marked as Modified, Exclusive, Shared or Invalid • MESI

  46. MESI Protocol • Modified • The line in the cache has been modified and is available only in this cache • Exclusive • The line in the cache is the same as that in main memory and is not present in any other cache • Shared • The line in the cache is the same as that in main memory and may be present in another cache • Invalid • The line in the cache does not contain valid data

  47. Write Update • Multiple readers and writers • Updated word is distributed to all other processors • Some systems use an adaptive mixture of both solutions

  48. Snoopy Protocols Explanation Using write through policy P1 P2 P1 P2 P1 P2 x x x’ I x’ x’ x x’ x’ Original Write invalidate Write update

  49. Clusters • Alternative to SMP • High performance • High availability • Server applications • A group of interconnected whole computers • Working together as unified resource • Illusion of being one machine • Each computer called a node

  50. Cluster Benefits • Absolute scalability • Incremental scalability • High availability • Superior price/performance