1 / 36

Multithreading algorithms

Juan Mendivelso. Multithreading algorithms. Serial Algorithms : Suitable for running on an uniprocessor computer in which only one instruction executes at a time. Parallel Algorithms : Run on a multiprocessor computer that permits multiple execution to execute concurrently .

clover
Download Presentation

Multithreading algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Juan Mendivelso Multithreadingalgorithms

  2. Serial Algorithms: Suitableforrunningonanuniprocessorcomputerin whichonlyoneinstructionexecutes at a time. ParallelAlgorithms: Runon a multiprocessorcomputerthatpermitsmultipleexecutiontoexecuteconcurrently. Serial Algorithms & parallelalgorithms

  3. Computerswithmultipleprocessingunits. • They can be: • Chip Multiprocessors: Inexpensive laptops/desktops. Theycontain a single multicoreintegrated-circuitthathousesmultipleprocessor “cores” each of whichis a full-fledgedprocessorwithaccesstocommonmemory. PARALLEL COMPUTERS

  4. Computerswithmultipleprocessingunits. • They can be: • Clusters: Buildfrom individual computerswith a dedicatednetworksysteminterconnectingthem. Intermediateprice/performance. PARALLEL COMPUTERS

  5. Computerswithmultipleprocessingunits. • They can be: • Supercomputers: Combination of customarchitectures and customnetworkstodeliverthehighest performance (instructions per second). Highprice. PARALLEL COMPUTERS

  6. Althoughtherandom-access machine modelwasearlyacceptedfor serial computing, no model has beenestablishedforparallelcomputing. A majorreasonisthatvendorshavenotagreedon a single architecturalmodelforparallelcomputers. Modelsforparallelcomputing

  7. Forexamplesomeparallelcomputersfeaturesharedmemorywhereallprocessors can accessanylocation of memory. Othersemploydistributedmemorywhereeachprocessor has a privatememory. However, thetrendappearstobetowardsharedmemorymultiprocessor. Modelsforparallelcomputing

  8. Shared-memoryparallelcomputers use staticthreading. Software abstraction of “virtual processors” orthreadssharing a commonmemory. Eachthread can executecodeindependently. Formostapplications, threadspersistfortheduration of a computation. Staticthreading

  9. Programming a shared-memoryparallelcomputerdirectlyusingstaticthreadsisdifficult and error prone. Dynamicallypartioningtheworkamongthethreads so thateachthreadreceivesapproximatelythesame load turnsouttobecomplicated. PROBLEMS OF STATIC THREADING

  10. Theprogrammermust use complexcommunicationprotocolstoimplement a schedulerto load-balance thework. This has ledtothecreation of concurrencyplatforms. Theyprovide a layer of software thatcoordinates, schedules and managestheparallel-computingresources. PROBLEMS OF STATIC THREADING

  11. Class of concurrencyplatform. Itallowsprogrammerstospecifyparallelism in applicationswithoutworryingaboutcommunicationprotocols, load balancing, etc. Theconcurrencyplatformcontains a schedulerthat load-balances thecomputationautomatically. DYNAMIC MULTITHREADING

  12. Itsupports: • Nestedparallelism:Itallows a subroutinetobespawned, allowingthecallertoproceedwhilethespawnedsubroutineiscomputingitsresult. • Parallelloops: regular forloopsexceptthattheiterations can beexecutedconcurrently. DYNAMIC MULTITHREADING

  13. Theuseronlyspicifiesthelogicalparallelism. Simple extension of the serial modelwith: parallel, spawn and sync. Cleanwaytoquantifyparallelism. Manymultithreadedalgorithmsinvolvingnestedparallelismfollownaturallyfromthe Divide & Conquerparadigm. ADVANTAGES OF DYNAMIC MULTITHREADING

  14. FibonacciExample • The serial algorithm: Fib(n) • Repeatedwork • Complexity • However, recursivecalls are independent! • Parallelalgorithm: P-Fib(n) BASICS OF MULTITHREADING

  15. Concurrencykeywords: spawn, sync and parallel Theserialization of a multithreadedalgorithmisthe serial algorithmthatresultsfromdeletingtheconcurrencykeywords. Serialization

  16. Itoccurswhenthekeywordspawn precedes a procedurecall. Itdiffersfromtheordinaryprocedurecall in thattheprocedureinstancethatexecutesthespawn - theparent – maycontinuetoexecute in parallelwiththespawnsubroutine – itschild- instead of waitingforthechildto complete. NESTED PARALLELISM

  17. Itdoesn’tsaythat a proceduremustexecuteconcurrentlywithitsspawnedchildren; onlythatitmay! Theconcurrencykeywordsexpressthelogicalparallelismof thecomputation. At runtime, itis up totheschedulerto determine whichsubcomputationsactuallyrunconcurrentlybyassigningthemtoprocessors. Keywordspawn

  18. A procedurecannotsafely use thevaluesreturnedbyitsspawnedchildrenuntilafteritexecutes a syncstatement. The keyword sync indicates that the procedure must wait until all its spawned children have been completed before proceeding to the statement after the sync. Every procedure executes a sync implicitlybeforeitreturns. Keywordsync

  19. We can see a multithreadcomputationas a directedacyclicgraph G=(V,E) called a computationaldag. Thevertices are instructions and and the edges represent dependencies between instructions, where (u,v) єE means that instruction u must execute before instruction v. Computationaldag

  20. If a chain of instructions contains no parallel control (no spawn, sync, or return), we may group them into a single strand, each of which represents oneor more instructions. Instructionsinvolving parallel control are not included in strands, but are represented in the structure of the dag. Computationaldag

  21. For example, if a strand has two successors, one of them must have been spawned, and a strand with multiple predecessors indicates the predecessors joined because of a sync. Thus, in the general case, the set V forms the set of strands, and the set E of directed edges represents dependencies between strands induced by parallel control. Computationaldag

  22. If G has a directed path from strand u to strand, we say that the two strands are (logically) in series. Otherwise, strands u and are (logically) in parallel. We can picture a multithreaded computation as a dag of strands embedded in a tree of procedure instances. Example! Computationaldag

  23. We can classify the edges: • Continuationedge :connects a strand u to its successor u’within the same procedure instance. • Call edges: representing normal procedure calls. • Return edges: When a strand u returns to its calling procedure and x is the strand immediately following the next sync in the calling procedure. • A computationstartswithaninitialstrandand endswith a single final strand. Computationaldag

  24. A parallelcomputerthatconsists of a set of processors and a sequentialconsistentsharedmemory. Sequentialconsistentmeansthatthesharedmemorybehaves as ifthemultithreadedcomputation’sinstructionswereinterleavedto produce alinear orderthat preserves thepartialorder of thecomputationdag. IDEAL PARALLEL COMPUTER

  25. Depending on scheduling, the ordering could differ from one run of the program to another. • The ideal-parallel-computermodel makes some performance assumptions: • Each processor in the machine has equal computing power • It ignores the cost of scheduling. IDEAL PARALLEL COMPUTER

  26. Work: • Total time toexecutetheentirecomputationononeprocessor. • Sum of the times takenbyeach of thestrands. • In thecomputationaldag, itisthenumber of strands (assumingeachstrandtakes a time unit). PERFORMANCE MEASURES

  27. Span: • Longest time toexecutethgestrandsalong in path in thedag. • Thespanequalsthenumber of verticeson a longestorcriticalpath. • Example! PERFORMANCE MEASURES

  28. The actual running time of a multithreadedcomputationdependsalsoonhowmanyprocessors are availableand howtheschedulerallocatesstrandstoprocessors. Running time on P processors: TP Work: T1 Span: T∞ (unlimitednumber of processors) PERFORMANCE MEASURES

  29. Thework and spanprovidelowerboundontherunning time of a multithreadedcomputationTP on P processors: • Worklaw:TP ≥ T1 /P • Spanlaw:TP ≥ T∞ PERFORMANCE MEASURES

  30. Speedup: • Speedup of a computationon P processorsisthe ratio T1 /TP • Howmany times fasterthecomputationison P processorsthanononeprocessor. • It’s at most P. • Linear speedup: T1 /TP = θ(P) • Perfect linear speedup: T1 /TP =P PERFORMANCE MEASURES

  31. Parallelism: • T1 /T∞ • Averageamountamount of work that can be performed in parallel for each step along the critical path. • As an upper bound, the parallelism gives the maximum possible speedup that can be achieved on any number of processors. • The parallelism provides a limit on the possibility of attaining perfect linear speedup. PERFORMANCE MEASURES

  32. Good performance dependson more thanminimizingthespan and work. Thestrandsmustalsobescheduled efficiently onto the processors of the parallel machine. On multithreaded programming model provides no way to specify which strands to execute on which processors. Instead, we rely on the concurrency platform’s scheduler. SCHEDULING

  33. A multithreaded scheduler must schedule the computation with no advance knowledge of when strands will be spawned or when they will complete—it must operate on-line. Moreover, a good scheduler operates in a distributed fashion, where the threads implementing the scheduler cooperate to load-balance the computation. SCHEDULING

  34. To keep the analysis simple, we shall consider an on-line centralized scheduler, which knows the global state of the computation at any given time. In particular, we shall consider greedy schedulers, which assign as many strands to processors as possible in each time step. SCHEDULING

  35. If at least P strands are ready to execute during a time step, we say that the step is a complete step, and a greedy scheduler assigns any P of the ready strands to processors. Otherwise, fewer than P strands are ready to execute, in which case we say that the step is an incomplete step, and the scheduler assigns each ready strand to its own processor. SCHEDULING

  36. A greedyschedulerexecutes a multithreadedcomputationin time: TP ≤ T1 /P + T∞ Greedyschedulingisprovablygoodbecausesitachievesthesum of thelowerbounds as anupperbound. Besidesitiswithin a factor of 2 of optimal. SCHEDULING

More Related