1 / 48

Parallel Computing Department Of Computer Engineering Ferdowsi University

Parallel Computing Department Of Computer Engineering Ferdowsi University. Hossain Deldari. Lecture organization. Parallel Processing Super Computer Parallel Computer Amdahl’s Low, Speedup, Efficiency Parallel Machine Architecture Computational Model Concurrency Approach

Download Presentation

Parallel Computing Department Of Computer Engineering Ferdowsi University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

  2. Lecture organization • Parallel Processing • Super Computer • Parallel Computer • Amdahl’s Low, Speedup, Efficiency • Parallel Machine Architecture • Computational Model • Concurrency Approach • Parallel Programming • Cluster Computing

  3. It is the division of work into smaller tasks • Assigning many smaller tasks to multiple workers to work on simultaneously • Parallel processing is the use of multiple processors to execute different parts of the same program simultaneously • Difficulties: coordinating, controlling and monitoring the workers • The main goals of parallel processing are: • -solve much bigger problems much faster! • to reduce wall-clock time of execution of computer programs • to increase the size of computational problems that can be solved What is Parallel Processing?

  4. What is a Supercomputer? • A supercomputer is a computer that is a lot faster than the computers that • normal people use • Note: This is a time-dependent definition TOP500 Lists Supercomputer & parallel computer TMCCM-5/1024/ 1024 59.70131.00 Los Alamos National LaboratoryUSA/ Installation SiteCountry/Year ManufacturerComputer/Procs RmaxRpeak June 1993:

  5. June 2003: Rmax LINPACK is a Benchmark Maximal LINPACK performance achieved 35860.0040960.00 Rpeak Theoretical peak performance NECEarth-Simulator/ 5120 Earth simulator center Installation SiteCountry/Year ManufacturerComputer/Procs RmaxRpeak Japan

  6. Amdahl’s low, Speedup, Efficiency Amdahl’s Law

  7. Efficiency Efficiency is a measure of the fraction of time that a processor spends performing useful work.

  8. Shunt Operation

  9. Parallel and Distributed Computers • SIMD • MIMD • MISD • Clusters

  10. SIMD (Single Instruction Multiple Data)

  11. MISD(Multi Instruction Single Data)

  12. MIMD (Multiple Instruction Multiple Data)

  13. MIMD(cont.)

  14. Parallel machine architecture • Shared memory model • Bus-based • Switch-based • NUMA • Distributed memory model • Distributed shared memory model • Page-based • Object-based • Hardware

  15. Shared memory model

  16. Shared memory model(cont.) • - Shared memory or Multiprocessor • OpenMP is a standard (C/C++/FORTRAN) • Advantage: • Easy Programming. • Disadvantage: • Design Complexity • Not Scalable

  17. Bus-based shared memory model • Bus is bottleneck • Not scalable

  18. Switch-based shared memory model • - Maintenance is difficult. • Expensive • scalable

  19. NUMA model • NUMA stands for Non-Uniform Memory Access. • Simulated shared memory • Better scalability

  20. Distributed memory model • Multi computer • MPI(Message Passing Interface) • Easy design • Low cost • High scalability • Difficult programming

  21. Examples of Network Topology Ring Linear Array 1 2 Fully Connected Mesh 6 3 5 4

  22. Examples of Network Topology(cont.) 1110 1111 1010 1011 0110 0111 0010 0011 S 1101 1010 1000 1001 0100 0101 0010 0000 0001 d = 4 Hypercubes

  23. Distributed shared memory model • Simpler abstraction • Sharing data • easier portability • Easy design with easy programming • Low performance(for high communication)

  24. Parallel and Distributed Architecture (Leopold, 2001) SIMD SMP NUMA Cluster SIMD MIMD Shared Memory Distributed Memory Degree of Coupling tight loose Supported Grain Sizes coarse fine Communication Speed fast slow

  25. Computational Model • RAM • PRAM • BSP • LOGP • MPI

  26. RAM Model

  27. PRAM Model Control • Synchronized Read Compute Write Cycle • EREW • ERCW • CREW • CRCW Private Memory P1 Global Private Memory P2 Memory Private Memory Pp Parallel Random Access Machine

  28. Bulk Synchronous Parallel (BSP) Model Processes • Generalization of PRAM Model • Processor-Memory Pairs • Communication Network • Barrier Synchronization Super-step Execute Communications Barrier Synchronization

  29. Complexity g (communication throughput) • Cost of superstep = • w+max(hs,hr).g+l • w (maximum number of local operation) • hs (maximum # of packets sent) • hr (maximum # of packets received) p (number of Processors) l (synchronization latency) BSP Space

  30. LogP Model • Closely related to BSP • It models asynchronous execution • News Parameters • L(message latency) • oThe overhead, defined as the length of time that a processor is engaged in the transmission or reception of each message. During this time the processor cannot perform other operations. • g: The gap, defined as the minimum time interval between consecutive message transmissions or receptions. The reciprocal of g corresponds to the available per-processor bandwidth • P: The number of processor/memory modules.

  31. Logp (cont.)

  32. MPI(Message Passing Interface) • What Is MPI? • A message-passing library specification • message-passing model • not a compiler specification • not a specific product • For parallel computers, clusters, and heterogeneous networks • Full-featured • Designed to permit (unleash?) the development of parallel software libraries • Designed to provide access to advanced parallel hardware for • end users • library writers • tool developers

  33. MPI Layer Application Application MPI MPI Task 1 Task 2 Comm. Comm. Node 2 Node 1 Virtual communication Real communication

  34. Matrix Multiplication Example

  35. PRAM Matrix Multiplication Cost Of PRAM Algorithm

  36. BSP Matrix Multiplication Cost of algorithm

  37. Concurrency Approach • Control Parallel • Data Parallel

  38. Control Parallel

  39. Data Parallel

  40. The Best granularity for programming

  41. Parallel Programming • Explicit Parallel Programming • Occam, MPI, PVM • Implicit Parallel Programming • Parallel functional programming • ML,… • Concurrent object-oriented programming • COOL,… • Data parallel programming • Fortran 90, HPF,…

  42. Cluster Computing • A Cluster system is • Parallel multicomputer built from high-end PCs and conventional high-speed network. • Support parallel programming

  43. Cluster Computing(cont.) Applications • Scientific Computing • Simulation , CFD, CAD/CAM , Weather prediction, process large volume of data • Super server system • Scalable internet/ web server • Database server • Multimedia, video, audio server

  44. Cluster Computing(cont.) Application Layer System Tool Layer Single System Image Layer OS OS OS OS HW HW HW HW High Speed Network Cluster System Building Block

  45. Cluster Computing(cont.) Why cluster computing? • Scalability • Build small system first, grow it later. • Low-cost • Hardware based on COTS model (Component off-the-shelf) • S/w(SoftWare) based on freeware from research community • Easier to maintain • Vendor independent

  46. The End Question?

More Related