1 / 24

Parallel Processing & Parallel Algorithm

Parallel Processing & Parallel Algorithm. May 8, 2003 B4 Yuuki Horita. Chapter 3. Principles of Parallel Programming. Principles of Parallel Programming. Data Dependency Processors Communication Mapping Granularity. Data Dependency. data flow dependency data anti-dependency

barr
Download Presentation

Parallel Processing & Parallel Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Processing &Parallel Algorithm May 8, 2003 B4 Yuuki Horita

  2. Chapter 3 Principles of Parallel Programming

  3. Principles of Parallel Programming • Data Dependency • Processors Communication • Mapping • Granularity

  4. Data Dependency • data flow dependency • data anti-dependency • data output dependency • data input dependency • data control dependency

  5. data flow dependency &data anti-dependency - data flow dependency S1: A = B + C S2: D = A + E - data anti-dependency S1: A = B + C S2: B = D + E

  6. data output dependency &data input dependency - data output dependency S1: A = B + C S2: A = D + E - data input dependency S1: A = B + C S2: D = B + E

  7. data control dependency S1: A = B - C if ( A > 0 ) S2: then D = 1; S3: else D = 2; endif S1⇒S2,S3

  8. Dependency Graph • G = ( V, E ) V(Vertices) : statements E(Edges) : dependency ex) S1: A = B + C S1 → S2 : flow dep. , anti-dep. S2: B = A * 3 S1 → S3 : output dep. , input dep. S3: A = 2 * C S2 → S3 : anti-dep. S4: P = B ≧0 S2 → S4 : flow dep. if ( P is True) S4 → S5 , S6 : control dep. S5: then D = 1 S6: else D = 2 endif

  9. Elimination of the dependencies • data output dependency • data anti-dependency ⇒ renaming can remove these form of dependency

  10. Ex) S1: A = B + C S2: B = D + E S3: A = F + G S1,S2 : anti-dep. S1,S3 : output-dep. S1: A = B + C S2: B’ = D + E S3: A’ = F + G No dependency! Elimination of the dependencies(2) Renaming

  11. Processors communication • Message passing communication - processors communicate via communication links • Shared memory communication - processors communicate via common memory

  12. Message Passing System Interconnection Network PE1 PE2 PE3 PEm ・・・ M1 M2 M3 Mm ・・・

  13. Message Passing System(2) • Send and Receive operations - blocking - nonblocking • System - synchronous blocking send and blocking receive operations - asynchronous nonblocking send and blocking receive operations ( the messages are buffered )

  14. Shared memory system Global Memory M1 M2 M3 Mm ・・・ Interconnection Network PE1 PE2 PE3 PEm ・・・

  15. Mapping … matching parallel algorithms to parallel architecture mapping to - asynchronous architecture - synchronous architecture - distributed architecture

  16. Mapping to Asynchronous Architecture Mapping a program to an asynchronous shared-memory computer has the following steps: • Allocate each statement in the program to a processor • Allocate each variable to a memory • Specify the control flow for each processor the sender may send many messages without the receiver removing them from the channel ⇒ the need for buffering messages

  17. Mapping to Synchronous Architecture • Mapping a program is the same as Asynchronous Architecture • a common clock for synchronization purposes • each processor executes an instruction at each step ( at each clock tick) • only one message exists at any one time on the channel ⇒ no buffering is needed

  18. Mapping to Distributed Architecture • A local memory accessible only by owning processor • only a pair of processor along the channel • Mapping a program is the same as in the asynchronous shared-memory architecture, except that each variable is allocated either to a processor or a channel

  19. Granularity relates to the ratio of the amount of computation to the amount of communication • fine : at statement level • medium : at procedure level • coarse : at program level

  20. Program Level Parallelism • a program creates a new process by creating a complete copy of itself - Fork()  (UNIX)

  21. Statement Level Parallelism ・ Parbegin-Parend block Par-begin Statement1 Statement2 : Statementn Par-end ⇒ the statements Statement1~Statementn are executed in parallel

  22. Statement Level Parallelism(2) ex) (a + b) * (c + d) – (e / f ) Par-begin Par-begin t1 = a + b t2 = c + d Par-end t4 = t1 * t2 t3 = e / f Par-end t5 = t4 – t3

  23. Statement Level Parallelism(3) ・ Fork, Join, Quit - Fork x cause a new process to be created and to start executing at the instruction labeled x - Join t, y t = t – 1 if ( t = 0) then go to y - Quit the process terminates

  24. Statement Level Parallelism(4) ex) (a + b) * (c + d) – (e / f ) n = 2 m = 2 Fork P2 Fork P3 P1: t1 = a + b; Join m, P4; Quit; P2: t2 = c + d; Join m, P4; Quit; P4: t4 = t1*t2; Join n, P5; Quit; P3: t3 = e / f; Join n, P5; Quit; P5: t5 = t4 – t3

More Related