1 / 90

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design. Carl Tropper Department of Computer Science. What has to be done. Identify concurrency in program Map concurrent pieces to parallel processes Distribute input, output and intermediate data Manage accesses to shared data by processors

morey
Download Presentation

Principles of Parallel Algorithm Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principles of Parallel Algorithm Design Carl Tropper Department of Computer Science

  2. What has to be done • Identify concurrency in program • Map concurrent pieces to parallel processes • Distribute input, output and intermediate data • Manage accesses to shared data by processors • Synchronize processors as program executes

  3. Vocabulary • Tasks • Task Dependency graph

  4. Matrix vector multiplication

  5. Database Query • Model =civic and year=2001 and (color=green or color=white)

  6. Data Dependencies

  7. Another graph

  8. Task talk • Task Granularity • Fine grained, coarse grained • Degree of concurrency • Average degree-average number of tasks which can run in parallel • Maximum degree • Critical path • Length-sum of the weights of the nodes on the path • Average degree of concurrency=total work/length

  9. Task interaction graph • Nodes are tasks • Edges indicate interaction of tasks • Task dependency graph a subset of task interaction graph

  10. Sparse matrix multiplication • Tasks compute entries of output vector • Task i owns row i and b(i) • Task i sends non zero elements of row i to other tasks which need them

  11. Sparse matrix task interaction graph

  12. Process MappingGoals and illusions • Goals • Maximize concurrency by mapping independent tasks to different processors • Minimize completion time by having a process ready on the critical path when a task is ready • Map processes which communicate a lot to same processor • Illusions • Can’t do all of the above-they conflict

  13. Task Decomposition • Big idea • First decompose for message passing • Then decompose for the shared memory on each node • Decomposition Techniques • Recursive • Data • Exploratory • Speculative

  14. Recursive Decomposition • Good for problems which are amenable to a divide and conquer strategy • Quicksort - a natural fit

  15. Quicksort Task Dependency Graph

  16. Sometimes we force the issue We re-cast the problem into divide and conquer paradigm

  17. Data Decomposition • Idea-partitioning of data leads to tasks • Can partition • Output data • Input data • Intermediate data • Whatever………………….

  18. Partitioning Output Data Each element of the output is computed independently as a function of the input

  19. Other decompositions

  20. Output data againFrequency of itemsets

  21. Partition Input Data • Sometimes more natural thing to do • Sum of n numbers-only have one output • Divide input into groups • One task per group • Get intermediate results • Create one task to combine intermediate results

  22. Top-partition inputBottom-partition input and output

  23. Partitioning of Intermediate Data • Good for multi-stage algorithms • May improve concurrency over a strictly input or strictly output partition

  24. Matrix Multiply Again

  25. Concurrency Picture • Max concurrency of 8 vs • Max concurrency of 4 for output partition • Price is storage for D

  26. Exploratory Decomposition • For search space type problems • Partition search space into small parts • Look for solution in each part

  27. Search Space ProblemThe15 puzzle

  28. Decomposition

  29. Parallel vs serial-Is it worth it?It depends on where you find the answer

  30. Speculative Decomposition • Computation gambles at a branch point in the program • Takes path before it knows result • Win big or waste

  31. Speculative ExampleParallel discrete event simulation • Idea: Compute results at c,d,e before output from a is known

  32. Hybrid • Sometimes better to put two ideas together

  33. Hybrid • Quicksort - Recursion results in O(n) tasks, little concurrency. • First decompose, then recurse (a poem)

  34. Mapping Tasks and their interactions influence choice of mapping scheme

  35. Task Characteristics Task generation Static- know all tasks before algorithm executes • Data decomposition leads to static generation Dynamic-runtime • Recursive decomposition leads to dynamic generation • Quicksort

  36. Task Characteristics • Task sizes • Uniform, non-uniform • Knowledge of task sizes • 15 puzzle: don’t know task sizes • Matrix multiplication: do know task sizes • Size of data associated with tasks • Big data can cause big communication

  37. Task interactions • Tasks share data, synchronization information, work • Static vs dynamic • Static-know task interaction graph and when interactions happen before execution • Parallel matrix multiply • Dynamic • 15 puzzle problem

  38. More interactions • Regular versus irregular • Interaction may have structure which can be used • Regular: image dithering • Irregular: sparse matrix multiplication • Access pattern for b depends on structure of A

  39. Image dithering

  40. Data sharing • Read only- parallel matrix multiply • Read-write • 15 puzzle • Heuristic search:estimate number of moves to solution from each state • Use priority queue to store states to be expanded • Priority queue contains shared data

  41. Task interactions • One way • Read only • Two way • Producer consumer style • Read-write (15 puzzle)

  42. Mapping tasks to processes Goal Reduce overhead caused by parallel execution So • Reduce communication between processes • Minimize task idling • Need to balance the load • But these goals can conflict

  43. Balancing load is not always enough to avoid idling Task dependencies get in the way Processes 9-12 can’t proceed until 1-8 finish MORAL: Include task dependency information in mapping

  44. Mappings can be • Static-distribute tasks before algorithm executes • Depends on task size, size of data, task interactions • NP complete for non-uniform tasks • Dynamic-distribute tasks during algorithm execution • Easier with shared memory

  45. Static Mapping • Data partitioning • Results in task decomposition • Arrays, graphs common ways to represent data • Task partitioning • Task dependency graph is static • Know task sizes

More Related