1 / 22

Parallelization at a Glance

Parallelization at a Glance. Christian Terboven <terboven@rz.rwth-aachen.de> 20.11.2012 / Aachen, Germany Stand: 19.11.2012 Version 2.3. Basic Concepts Parallelization Strategies Possible Issues Efficient Parallelization. Agenda. Basic Concepts.

cardea
Download Presentation

Parallelization at a Glance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelization at a Glance Christian Terboven <terboven@rz.rwth-aachen.de> 20.11.2012 / Aachen, Germany Stand: 19.11.2012 Version 2.3

  2. Basic Concepts • Parallelization Strategies • Possible Issues • Efficient Parallelization • Agenda

  3. Basic Concepts

  4. Process: Instance of a computerprogram • Thread: A sequenceofinstructions (ownProgram Counter) • Parallel / Concurrentexecution: • Multiple Processes • Multiple Threads • Per process: Address Space, Files, Environment, … • Per thread: Program Counter, Stack, … • Processesand Threads computer program counter thread process

  5. Even on a multi-socket multi-core systemyoushould not makeanyassumptionwhichprocess / threadisexecutedwhenandwhere! • Twothreads on onecore: • Twothreads on twocores: • Processand Thread Scheduling bythe OS Thread1 Thread 2 System thread threadmigration “pinned” threads

  6. Memory canbeaccessedbyseveralthreadsrunning on different cores in a multi-socket multi-core system: • Shared Memory Parallelization CPU1 CPU2 a=4 a c=3+a

  7. Parallelization Strategies

  8. Chancesforconcurrentexecution: • Look fortasksthatcanbeexecutedsimultaneously(taskparallelism) • Decomposedataintodistinctchunkstobeprocessedindependently(dataparallelism) • FindingConcurrency Organize by data decomposition Organize by flow of data Organize by task Geometric decomposition Pipeline Task parallelism Event-based coordination Divide and conquer Recursive data

  9. Divide and conquer Problem split Subproblem Subproblem split Subproblem Subproblem Subproblem Subproblem solve Subsolution Subsolution Subsolution Subsolution merge Subsolution Subsolution merge Solution

  10. Geometric decomposition Example: ventricular assist device (VAD)

  11. Analogy assembly line • Assign different stages to different PEs • Pipeline Time C1 C2 C3 C4 C5 Stage 1 C1 C2 C3 C4 C5 Stage 2 C1 C2 C3 C4 C5 Stage 3 C1 C2 C3 C4 C5 Stage 4

  12. Recursive data pattern • Parallelize operations on recursive data structures • See example later on: Fibonacci • Event-based coordination pattern • Decomposition into semi-independent tasks interacting in an irregular fashion • See example later on: While-loop with tasks • Further parallel design patterns

  13. PossibleIssues

  14. Data Race: Concurrentaccessofthe same memorylocationby multiple threadswithout proper synchronization • Let variable x havetheinitialvalue 1 • Depending on whichthreadisfaster, you will seeeither 1 or 5 • Resultisnondeterministic (i.e. depends on OS scheduling) • Data Races (andhowtodetectandavoidthem) will becovered in moredetaillater! • Data Races / RaceConditions x=5; printf(x);

  15. Serialization: Whenthreads / processeswait „toomuch“ • Limited scalability, ifat all • Simple (and stupid) example: • Serialization Data Transfer Send Recv Wait Calc Data Transfer Send Recv Wait Calc Data Transfer Send Recv

  16. EfficientParallelization

  17. Overhead introducedbytheparallelization: • Time tostart / end / manage threads • Time to send / exchangedata • Time spent in synchronizationofthreads / processes • Withparallelization: • The total CPU time increases, • The Wall time decreases, • The System time staysthe same. • Efficientparallelizationisaboutminimizingtheoverheadintroducedbytheparallelizationitself! • Parallelization Overhead

  18. LoadBalancing • All threads / processes finish atthe same time • Somethreads / processestakelongerthanothers • But: All threads / processeshavetowaitfortheslowestthread / process, whichisthuslimitingthescalability perfect load balancing load imbalance time time

  19. Time using 1 CPU: T(1) • Time using p CPUs: T(p) • Speedup S: S(p)=T(1)/T(p) • Measureshowmuchfasterthe parallel computationis! • Efficiency E: E(p)=S(p)/p • Example: • T(1)=6s, T(2)=4s S(2)=1.5 E(2)=0.75 • Ideal case: T(p)=T(1)/p  S(p)=p E(p)=1.0 • Speedupand Efficiency

  20. Describestheinfluenceoftheserialpartontoscalability(withouttakinganyoverheadintoaccount).Describestheinfluenceoftheserialpartontoscalability(withouttakinganyoverheadintoaccount). • S(p)=T(1)/T(p)=T(1)/(f*T(1) + (1-f)*T(1)/p)=1/(f+(1-f)/p) • f: serialpart (0f1) • T(1) : time using 1 CPU • T(p): time using p CPUs • S(p): speedup; S(p)=T(1)/T(p) • E(p): efficiency; E(p)=S(p)/p • Itisrather easy toscaleto a smallnumberofcores, but anyparallelizationis limited bytheserialpartoftheprogram! • Amdahl‘s Law

  21. If 80% (measured in programruntime) ofyourworkcanbeparallelizedand „just“ 20% are still runningsequential, thenyourspeedup will be: • Amdahl‘s Law illustrated  processors: time: 20% speedup: 5 1 processor: time: 100% speedup: 1 2 processors: time: 60% speedup: 1.7 4 processors: time: 40% speedup: 2.5

  22. After theinitialparallelizationof a program, you will typicallyseespeedupcurveslikethis: • Speedup in Practice speedup 8 Speedup according to Amdahl’s law 7 6 Ideal speedup S(p)=p 5 Realistic speedup 4 3 2 1 p . . . 1 2 3 4 5 6 7 8

More Related