1 / 72

Parallel Processing Comparative Study

Parallel Processing Comparative Study. Context . How to finish a work in short time ???? Solution To use quicker worker . Inconvenient: The speed of worker has a limit Inadequate for long works. Context . How to finish a calculation in short time ????

gwen
Download Presentation

Parallel Processing Comparative Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ParallelProcessing Comparative Study

  2. Context How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker has a limit Inadequate for long works

  3. Context How to finish a calculation in short time???? Solution To use quicker calculator (processor).[1960-2000] Inconvenient: The speed of processor has reach a limit Inadequate for long calculations

  4. Context How to finish a work in short time???? Solution • To use quicker worker. (Inadequate for long works)

  5. Context How to finish a work in short time???? Solution • To use quicker worker. (Inadequate for long works)

  6. Context How to finish a work in short time???? Solution • To use quicker worker. (Inadequate for long works) • To use more than one worker concurrently

  7. Context How to finish a Calculation in short time???? Solution • To use quicker processor (Inadequate for long calculations)

  8. Context How to finish a Calculation in short time???? Solution • To use quicker processor (Inadequate for long calculations)

  9. Context How to finish a Calculation in short time???? Solution • To use quicker processor (Inadequate for long calculations) • To use more than one processor concurrently

  10. Context How to finish a Calculation in short time???? Solution • To use quicker processor (Inadequate for long calculations) • To use more than one processor concurrently Parallelism

  11. Context Definition The parallelism is the concurrentuse of more than one processing unit (CPUs, Cores of processor, GPUs, or combinations of them) in order to carry out calculations more quickly

  12. Project Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer

  13. the Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer

  14. the Goal Parallel Computer • Several parallel computers in the hardware market • Differ in their architecture • Several Classifications • Based on the Instruction and Data Streams (Flynn classification) • Based on the Memory Charring Degree • ….

  15. the Goal Flynn Classification • Single Instruction and Single Data stream

  16. the Goal Flynn Classification B. Single Instruction and Multiple Data

  17. the Goal Flynn Classification C. Multiple Instruction and Single Data stream

  18. the Goal Flynn Classification D. Multiple Instruction and Multiple Data stream

  19. the Goal Memory Sharing Degree Classification A . Shared MemoryB. Distributed memory

  20. the Goal Memory Sharing Degree Classification C. Hybrid Distributed-Shared Memory

  21. the Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer • Dividing the calculation and data between the processors • Defining the execution scenario (how the processor cooperates)

  22. the Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer • Dividing the calculation and data between the processors • Defining the execution scenario (how the processor cooperates)

  23. the Goal Parallelism needs • Parallel Computer (more than one processors) • Accommodate Calculation to Parallel Computer • Dividing the calculation and data between the processors • Defining the execution scenario (how the processors cooperate)

  24. the Goal The accommodation of calculation to parallel computer • Is called parallel processing • Depend closely on the architecture

  25. the Goal Goal : A comparative study between • Shared Memory Parallel Processing approach • Distributed Memory Parallel Processing approach

  26. Plan • Distributed Memory Parallel Processing approach • Shared Memory Parallel Processing approach • Case study problems • Comparison results and discussion • Conclusion

  27. Distributed Memory Parallel Processing approach

  28. Distributed Memory Parallel Processing approach Distributed-Memory Computers (DMC) = Distributed Memory System (DMS) = Massively Parallel Processor (MPP)

  29. Distributed Memory Parallel Processing approach • Distributed-memory computers architecture

  30. Distributed Memory Parallel Processing approach • Architecture of nodes Nodes can be : identicalprocessors Pure DMC different types of processorHybrid DMC different type of nodeswith different ArchitecturesHeterogeneous DMC

  31. Distributed Memory Parallel Processing approach • Architecture of Interconnection Network • No shared memory space between nodes • Network is the only way of node-communications • Network performance influence directly the performance of parallel program on DMC • Network performance depends on : • Topology • Physical connectors (as wires…) • Routing Technique • The DMC evolutions closely depends on the Networking evolutions

  32. Distributed Memory Parallel Processing approach The Used DMC in our Comparative Study • Heterogeneous DMC • Modest cluster of workstations • Three nodes: • Sony Laptop: i3 processor • HP Laptop: i3 processor • HP Laptop core 2 due processor • Communication Network: 100 MByte-Ethernet

  33. Distributed Memory Parallel Processing approach Parallel Software Development for DMC Designer main tasks: • Global Calculation decomposition and tasks assignment • Data decomposition • Communications scheme Definition • Synchronization Study

  34. Distributed Memory Parallel Processing approach Parallel Software Development for DMC Important considerations for efficiency: • Minimize Communication • Avoid barrier synchronization

  35. Distributed Memory Parallel Processing approach Implementation environments Several implementation environments • PVM (Parallel Virtual Machine) • MPI (Message Passing Interface)

  36. Distributed Memory Parallel Processing approach MPI Application Anatomy All the node execute the same code All the nodes does not do the same work It’s possible using SPMD application form SPMD :.... The processes are organized in one controller and workers Contradiction

  37. Shared Memory Parallel Processing approach Several SMPC in the Markets Multi-core PC: Intel i3 i5 i7 ,AMD Which SMPC we use ? • GPU originally for image processing • GPU NOW : Domestic Super-Computer Characteristics: • Chipset and fastest Shared Memory Parallel computer • Hard Parallel Design

  38. Shared Memory Parallel Processing approach • The GPU Architecture • The implementation environment

  39. Shared memory parallel processing approach GPU Architecture As the classical processing unit, the Graphics Processing Unit is composed from two main components: A- Calculation Units B- Storage Unit

  40. Shared memory parallel processing approach

  41. Shared memory parallel processing approach Shared memory parallel processing approach

  42. Shared Memory Parallel Processing • The GPU Architecture • The implementation environment • CUDA : for GPUS manufactured by NVIDIA • OpenCL: independent of the GPU architecture

  43. Shared Memory Parallel Processing CUDA Program Anatomy

  44. Shared Memory Parallel Processing Q: How to execute code fragments to be parallelized in the GPU? R: By Calling a kernel Q: What’s Kernel ? R: A kernel is a function callable from the host andexecuted on the devicesimultaneously by many threads in parallel

  45. Shared Memory Parallel Processing Kernel launch

  46. Shared Memory Parallel Processing Kernel launch

  47. Shared Memory Parallel Processing kernel launch

  48. Shared Memory Parallel Processing Design recommendations • utilize the shared memory to reduce the amount of time to access the global memory. • reduce the amount of idle threads ( control divergence) to fully utilize the GPU resource.

  49. Case study problem Square Matrix multiplication problem • ALGORITHM:() // Input: Two matrices and // Output: Matrix for to do for to do for to do return • Complexity: If we use big notation the

  50. Case study problem Pi approximation • ALGORITHM:PiApprox () // Input: number of Bins // Output: approximation for to do return • Complexity: If we use big notation the.

More Related