Chapter 3
This presentation is the property of its rightful owner.
Sponsored Links
1 / 54

Chapter 3 PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on
  • Presentation posted in: General

Chapter 3. Parallel Algorithm Design. Outline. Task/channel model Algorithm design methodology Case studies. Task/Channel Model. Parallel computation = set of tasks Task Program Local memory Collection of I/O ports Tasks interact by sending messages through channels. Task. Channel.

Download Presentation

Chapter 3

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chapter 3

Chapter 3

Parallel Algorithm Design


Outline

Outline

  • Task/channel model

  • Algorithm design methodology

  • Case studies


Task channel model

Task/Channel Model

  • Parallel computation = set of tasks

  • Task

    • Program

    • Local memory

    • Collection of I/O ports

  • Tasks interact by sending messages through channels


Task channel model1

Task

Channel

Task/Channel Model


Foster s design methodology

Foster’s Design Methodology

  • Partitioning

    • Dividing the Problem into Tasks

  • Communication

    • Determine what needs to be communicated between the Tasks over Channels

  • Agglomeration

    • Group or Consolidate Tasks to improve efficiency or simplify the programming solution

  • Mapping

    • Assign tasks to the Computer Processors


Foster s methodology

Foster’s Methodology


Step 1 partitioning divide computation data into pieces

Step 1: PartitioningDivide Computation & Data into Pieces

  • Domain Decomposition – Data Centric Approach

    • Divide up most frequently used data

    • Associate the computations with the divided data

  • Functional Decomposition – Computation Centric Approach

    • Divide up the computation

    • Associate the data with the divided computations

  • Primitive Tasks:

    Resulting Pieces from either Decomposition

    • The goal is to have as many of these as possible


Example domain decompositions

Example Domain Decompositions


Example functional decomposition

Example Functional Decomposition


Partitioning checklist

Partitioning Checklist

  • Lots of Tasks

    • e.g, at least 10x more primitive tasks than processors in target computer

  • Minimize redundant computations and data

  • Load Balancing

    • Primitive tasks roughly the same size

  • Scalable

    • Number of tasks an increasing function of problem size


Step 2 communication determine communication patterns between primitive tasks

Step 2: CommunicationDetermine Communication Patterns between Primitive Tasks

  • Local Communication

    • When Tasks need data from a small number of other Tasks

    • Channel from Producing Task to Consuming Task Created

  • Global Communication

    • When Task need data from many or all other Tasks

    • Channels for this type of communication are not created during this step


Communication checklist

Communication Checklist

  • Balanced

    • Communication operations balanced among tasks

  • Small degree

    • Each task communicates with only small group of neighbors

  • Concurrency

    • Tasks can perform communications concurrently

    • Task can perform computations concurrently


Step 3 agglomeration group tasks to improve efficiency or simplify programming

Step 3: AgglomerationGroup Tasks to Improve Efficiency or Simplify Programming

  • Increase Locality

    • remove communication by agglomerating Tasks that Communicate with one another

    • Combine groups of sending & receiving task

      • Send fewer, larger messages rather than more short messages which incur more message latency.

  • Maintain Scalability of the Parallel Design

    • Be careful not to agglomerate Tasks so much that moving to a machine with more processors will not be possible

  • Reduce Software Engineering costs

    • Leveraging existing sequential code can reduce the expense of engineering a parallel algorithm


Agglomeration can improve performance

Agglomeration Can Improve Performance

  • Eliminate communication between primitive tasks agglomerated into consolidated task

  • Combine groups of sending and receiving tasks


Agglomeration checklist

Agglomeration Checklist

  • Locality of parallel algorithm has increased

  • Tradeoff between agglomeration and code modifications costs is reasonable

  • Agglomerated tasks have similar computational and communications costs

  • Number of tasks increases with problem size

  • Number of tasks suitable for likely target systems


Step 4 mapping assigning tasks to processors

Step 4: MappingAssigning Tasks to Processors

  • Maximize Processor Utilization

    • Ensure computation is evenly balanced across all processors

  • Minimize Interprocess Communication


Optimal mapping

Optimal Mapping

  • Finding optimal mapping is NP-hard

  • Must rely on heuristics


Decision tree for parallel algorithm design

Decision Tree for Parallel Algorithm Design


Mapping goals

Mapping Goals

  • Mapping based on one task per processor and multiple tasks per processor have been considered

  • Both static and dynamic allocation of tasks to processors have been evaluated

  • If a dynamic allocation of tasks to processors is chosen, the Task allocator is not a bottleneck

  • If Static allocation of tasks to processors is chosen, the ratio of tasks to processors is at least 10 to 1


Case studies

Case Studies

Boundary value problem

Finding the maximum

The n-body problem

Adding data input


Boundary value problem

Boundary Value Problem

Ice water

Rod

Insulation


Rod cools as time progresses

Rod Cools as Time Progresses


Finite difference approximation

Finite Difference Approximation


Partitioning

Partitioning

  • One data item per grid point

  • Associate one primitive task with each grid point

  • Two-dimensional domain decomposition


Communication

Communication

  • Identify communication pattern between primitive tasks:

    • Each interior primitive task has three incoming and three outgoing channels


Agglomeration and mapping

Agglomeration

Agglomeration and Mapping


Execution time estimate

Execution Time Estimate

  •  – time to update element

  • n – number of elements

  • m – number of iterations

  • Sequential execution time: mn

  • p – number of processors

  •  – message latency

  • Parallel execution time m(n/p+2)


Finding the maximum error

Finding the Maximum Error

6.25%


Reduction

Reduction

  • Given associative operator 

  • a0  a1  a2  …  an-1

  • Examples

    • Add

    • Multiply

    • And, Or

    • Maximum, Minimum


Parallel reduction evolution

Parallel Reduction Evolution


Parallel reduction evolution1

Parallel Reduction Evolution


Parallel reduction evolution2

Parallel Reduction Evolution


Binomial trees

Subgraph of hypercube

Binomial Trees


Finding global sum

Finding Global Sum

4

2

0

7

-3

5

-6

-3

8

1

2

3

-4

4

6

-1


Finding global sum1

Finding Global Sum

1

7

-6

4

4

5

8

2


Finding global sum2

Finding Global Sum

8

-2

9

10


Finding global sum3

Finding Global Sum

17

8


Finding global sum4

Binomial Tree

Finding Global Sum

25


Agglomeration

Agglomeration


Agglomeration1

sum

sum

sum

sum

Agglomeration


The n body problem

The n-body Problem


The n body problem1

The n-body Problem


Partitioning1

Partitioning

  • Domain partitioning

  • Assume one task per particle

  • Task has particle’s position, velocity vector

  • Iteration

    • Get positions of all other particles

    • Compute new position, velocity


Gather

Gather


All gather

All-gather


Complete graph for all gather

Complete Graph for All-gather


Hypercube for all gather

Hypercube for All-gather


Communication time

Complete graph

Hypercube

Communication Time


Adding data input

Adding Data Input


Scatter

Scatter


Scatter in log p steps

1234

12

5678

56

34

78

Scatter in log p Steps

12345678


Summary task channel model

Summary: Task/channel Model

  • Parallel computation

    • Set of tasks

    • Interactions through channels

  • Good designs

    • Maximize local computations

    • Minimize communications

    • Scale up


Summary design steps

Summary: Design Steps

  • Partition computation

  • Agglomerate tasks

  • Map tasks to processors

  • Goals

    • Maximize processor utilization

    • Minimize inter-processor communication


Summary fundamental algorithms

Summary: Fundamental Algorithms

  • Reduction

  • Gather and scatter

  • All-gather


  • Login