CDP Tutorial 4 Basics of Parallel Algorithm Design

CDP Tutorial 4Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003. http://www-users.cs.umn.edu/~karypis/parbook

Preliminaries: Decomposition, Tasks, and Dependency Graphs • Parallel algorithm should decompose the problem into tasks that can be executed concurrently. • A decomposition can be illustrated in the form of a directed graph (task dependency graph). • nodes = tasks and edges = dependencies: the result of one task is required for processing the next • Degree of concurrency determines the amount of tasks which can indeed run in parallel. • Critical path is the longest path in the dependency graph – lower bound on program runtime.

Example: Multiplying a Dense Matrix with a Vector Observations: While tasks share data (namely, the vector b ), they do not have any control dependencies - i.e., no task needs to wait for the (partial) completion of any other. All tasks are of the same size in terms of number of operations. Is this the maximum number of tasks we could decompose this problem into? Computation of each element of output vector y is independent of other elements. Based on this, a dense matrix-vector product can be decomposed into n independent tasks. Is this the maximum number of tasks we could decompose this problem into?

Multiplying a dense matrix with avector – 2n CPUs available Task 2 Task 1 On what kind of platform will we have 2N processors?

Multiplying a dense matrix with avector – 2n CPUs available Task 2 Task 2n+1 Task 1 Task 2n-1 Task 1 Task 2n Task 2 Task 2n+1 Task 3n

Granularity of Task Decompositions • The size of tasks into which a problem is decomposed is called granularity

Parallelization efficiency • Parallel algorithm scalability factors: • Amdahl’s law: parallel algorithm = parallel part + serial partupper bound on speedup: • Task interaction overhead • CPU utilization • stalls due to data dependencies • imperfect load balancing • Excess computations

Guidelines for good parallel algorithm design • Maximize concurrency. • Spread tasks evenly between processors to avoid idling and achieve load balancing. • Execute tasks on critical path as soon as the dependencies are satisfied. • Minimize(overlap) communication between processors.

Evaluating parallel algorithm • Speedup • S = Tserial / Tparallel • Efficiency • E = S / #processors • Cost • C = #processors * Tparallel • Scalability • Speedup (efficiency) as a function of number of processors with fixed task size.

Simple example: summing n numbers on p CPUs

Summing n numbers on p CPUs • Parallel Time = θ((n/p)log p) • Serial Time = θ(n) • Speedup = θ(p/(log p)) • Efficiency = θ(1/log p)

Parallel multiplication of 3 matrices • A, B, C – rectangular matrices N x N • We want to compute in parallel: A x B x C • How to achieve the maximum performance?

Try 1 2-CPU parallelization: B A TEMP X C TEMP Result X

Try 2 3-CPU parallelization: B A C' X C C' Result X Is this a good way?

Dynamic task creation - Simple Master-Worker • Identify independent computations A(1,1)*B(1,2) A(1,2)*B(2,2) A(1,1)*B(1,1) A(1,2)*B(2,1) + + T(2,1)*C(2,1) T(1,1)*C(1,1) + Result(1,1) • Create a queue of tasks • Each process picks from the queue and may insert a new task to the queue

Hypothetic implementation by multiple threads • Let's assign separate thread for each multiply/sum operation. • How do we coordinate the threads? • Initialize threads to know their dependencies. • Build “producer-consumer” like logic for every thread.

CDP Tutorial 4 Basics of Parallel Algorithm Design

CDP Tutorial 4 Basics of Parallel Algorithm Design

Presentation Transcript

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design

Parallel Algorithm Design

PARALLEL JACOBI ALGORITHM

PGA – Parallel Genetic Algorithm

Chapter 4 Computer Design Basics

Lecture 3: Parallel Algorithm Design

Parallel Algorithm Construction

Principles of Parallel Algorithm Design

Parallel Algorithm Configuration

Message Passing Interface (MPI) and Parallel Algorithm Design

Chapter 4 Computer Design Basics

Principles of Parallel Algorithm Design

Parallel Search Algorithm

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design

Parallel MIMD Algorithm Design

Analysis and design of algorithm Unit-4

Principles of Parallel Algorithm Design

3 Parallel Algorithm Complexity