1 / 19

Parallel and Multiprocessor Architectures

Parallel and Multiprocessor Architectures. Chapter 9.4. By Eric Neto. Parallel & Multiprocessor Architecture. In making processors faster, we run into certain limitations. Physical Economic Solution: When necessary, use more processors, working in sync.

neron
Download Presentation

Parallel and Multiprocessor Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel and Multiprocessor Architectures Chapter 9.4 By Eric Neto

  2. Parallel & Multiprocessor Architecture • In making processors faster, we run into certain limitations. • Physical • Economic • Solution: When necessary, use more processors, working in sync.

  3. Parallel & Multiprocessor Limitations • Though parallel processing can speed up performance, the amount is limited. • Intuitively, you’d expect N processors to do the work in 1/N time, but processes sometimes work in sequences, so there will be some downtime while dormant processors wait for the active processor to finish. • Therefore, the more sequential a process is, the less cost-effective it is to implement parallelism.

  4. Parallel and Multiprocessing Architectures • Superscalar • VLIW • Vector • Interconnection Networks • Shared Memory • Distributed Computing

  5. Superscalar Architecture • Allow multiple instructions to be executed simultaneously in each cycle. • Contain • Execution units – Each allows for one process to execute. • Specialized instruction fetch unit – Fetch multiple instructions at once, send them to decoding unit. • Decoding unit – Determines whether the given instructions are independent of one another.

  6. VLIW Architecture • Similar to superscalar, but relies on compiler rather than specific hardware. • Puts independent instructions into one “Very Long Instruction Words” • Advantages: • More simple hardware • Disadvantages: • Instructions fixed at compile time, so some modifications could affect execution of instructions

  7. Vector Processors • Use vector pipelines to store and perform operations on many values at once, as opposed to Scalar processing, which only performs operations on individual values. • Since it uses fewer instructions, there is less decoding, control unit overhead, and memory bandwidth usage. • Can be SIMD or MIMD. LDV V1, R1 LDV V2, R2 ADDV R3, V1, V2 STV R3, V3 Xn= X1 + X2 ; Yn= Y1 + Y2 ; Zn= Z1 + Z2 ; Wn= W1 + W2 ; … VS.

  8. Interconnection Networks • Each processor has it’s own memory, that can be accessed and shared by other processors through an interconnected network. • Efficiency of messages shared through the network is limited based on: • Bandwidth • Message latency • Transport latency • Overhead • In general, the amount of messages sent and distances they must travel are minimized.

  9. Topologies • Connections between networks can be either static or dynamic. • Different configurations of static processors are more useful for different tasks. Completely Connected Star Ring

  10. More Topologies Tree Mesh Hypercube

  11. Dynamic Networks • Busses, Crossbars, Switches, Multistage connections. • As you implement more processors, these get exponentially more expensive.

  12. Dynamic Networking:Crossbar Network • Efficient • Direct • Expensive

  13. Dynamic Networking:Switch Network • Complex • Moderately Efficient • Cheaper

  14. Dynamic Networking:Bus • Simple • Slow • Inefficient • Cheap

  15. Shared Memory Multiprocessors • Memory is shared either globally or locally, or a combination of the two.

  16. Shared Memory Access • Uniform Memory Access systems use a shared memory pool, where all memory takes the same amount of time to access. • Quickly becomes expensive when more processors are added.

  17. Shared Memory Access • Non-Uniform Memory Access systems have memory distributed across all the processors, and it takes less time for a processor to read from its own local memory than from non-local memory. • Prone to cache coherence problems, which occur when a local cache isn’t in sync with non-local caches representing the same data. • Dealing with these problems require extra mechanisms to ensure coherence.

  18. Distributed Computing • Multi-Computer processing • Works on the same principal as multi-processors on a larger scale. • Uses a large network of computers to solve small parts of a very large problem.

  19. The End

More Related