Chapter 3
Download
1 / 54

Chapter 3 - PowerPoint PPT Presentation


  • 98 Views
  • Updated On :

Chapter 3. Parallel Algorithm Design. Outline. Task/channel model Algorithm design methodology Case studies. Task/Channel Model. Parallel computation = set of tasks Task Program Local memory Collection of I/O ports Tasks interact by sending messages through channels. Task. Channel.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 3' - mahon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Chapter 3

Chapter 3

Parallel Algorithm Design


Outline
Outline

  • Task/channel model

  • Algorithm design methodology

  • Case studies


Task channel model
Task/Channel Model

  • Parallel computation = set of tasks

  • Task

    • Program

    • Local memory

    • Collection of I/O ports

  • Tasks interact by sending messages through channels


Task channel model1

Task

Channel

Task/Channel Model


Foster s design methodology
Foster’s Design Methodology

  • Partitioning

    • Dividing the Problem into Tasks

  • Communication

    • Determine what needs to be communicated between the Tasks over Channels

  • Agglomeration

    • Group or Consolidate Tasks to improve efficiency or simplify the programming solution

  • Mapping

    • Assign tasks to the Computer Processors



Step 1 partitioning divide computation data into pieces
Step 1: PartitioningDivide Computation & Data into Pieces

  • Domain Decomposition – Data Centric Approach

    • Divide up most frequently used data

    • Associate the computations with the divided data

  • Functional Decomposition – Computation Centric Approach

    • Divide up the computation

    • Associate the data with the divided computations

  • Primitive Tasks:

    Resulting Pieces from either Decomposition

    • The goal is to have as many of these as possible




Partitioning checklist
Partitioning Checklist

  • Lots of Tasks

    • e.g, at least 10x more primitive tasks than processors in target computer

  • Minimize redundant computations and data

  • Load Balancing

    • Primitive tasks roughly the same size

  • Scalable

    • Number of tasks an increasing function of problem size


Step 2 communication determine communication patterns between primitive tasks
Step 2: CommunicationDetermine Communication Patterns between Primitive Tasks

  • Local Communication

    • When Tasks need data from a small number of other Tasks

    • Channel from Producing Task to Consuming Task Created

  • Global Communication

    • When Task need data from many or all other Tasks

    • Channels for this type of communication are not created during this step


Communication checklist
Communication Checklist

  • Balanced

    • Communication operations balanced among tasks

  • Small degree

    • Each task communicates with only small group of neighbors

  • Concurrency

    • Tasks can perform communications concurrently

    • Task can perform computations concurrently


Step 3 agglomeration group tasks to improve efficiency or simplify programming
Step 3: AgglomerationGroup Tasks to Improve Efficiency or Simplify Programming

  • Increase Locality

    • remove communication by agglomerating Tasks that Communicate with one another

    • Combine groups of sending & receiving task

      • Send fewer, larger messages rather than more short messages which incur more message latency.

  • Maintain Scalability of the Parallel Design

    • Be careful not to agglomerate Tasks so much that moving to a machine with more processors will not be possible

  • Reduce Software Engineering costs

    • Leveraging existing sequential code can reduce the expense of engineering a parallel algorithm


Agglomeration can improve performance
Agglomeration Can Improve Performance

  • Eliminate communication between primitive tasks agglomerated into consolidated task

  • Combine groups of sending and receiving tasks


Agglomeration checklist
Agglomeration Checklist

  • Locality of parallel algorithm has increased

  • Tradeoff between agglomeration and code modifications costs is reasonable

  • Agglomerated tasks have similar computational and communications costs

  • Number of tasks increases with problem size

  • Number of tasks suitable for likely target systems


Step 4 mapping assigning tasks to processors
Step 4: MappingAssigning Tasks to Processors

  • Maximize Processor Utilization

    • Ensure computation is evenly balanced across all processors

  • Minimize Interprocess Communication


Optimal mapping
Optimal Mapping

  • Finding optimal mapping is NP-hard

  • Must rely on heuristics



Mapping goals
Mapping Goals

  • Mapping based on one task per processor and multiple tasks per processor have been considered

  • Both static and dynamic allocation of tasks to processors have been evaluated

  • If a dynamic allocation of tasks to processors is chosen, the Task allocator is not a bottleneck

  • If Static allocation of tasks to processors is chosen, the ratio of tasks to processors is at least 10 to 1


Case studies
Case Studies

Boundary value problem

Finding the maximum

The n-body problem

Adding data input


Boundary value problem
Boundary Value Problem

Ice water

Rod

Insulation




Partitioning
Partitioning

  • One data item per grid point

  • Associate one primitive task with each grid point

  • Two-dimensional domain decomposition


Communication
Communication

  • Identify communication pattern between primitive tasks:

    • Each interior primitive task has three incoming and three outgoing channels


Agglomeration and mapping

Agglomeration

Agglomeration and Mapping


Execution time estimate
Execution Time Estimate

  •  – time to update element

  • n – number of elements

  • m – number of iterations

  • Sequential execution time: mn

  • p – number of processors

  •  – message latency

  • Parallel execution time m(n/p+2)



Reduction
Reduction

  • Given associative operator 

  • a0  a1  a2  …  an-1

  • Examples

    • Add

    • Multiply

    • And, Or

    • Maximum, Minimum





Binomial trees

Subgraph of hypercube

Binomial Trees


Finding global sum
Finding Global Sum

4

2

0

7

-3

5

-6

-3

8

1

2

3

-4

4

6

-1


Finding global sum1
Finding Global Sum

1

7

-6

4

4

5

8

2




Finding global sum4

Binomial Tree

Finding Global Sum

25



Agglomeration1

sum

sum

sum

sum

Agglomeration




Partitioning1
Partitioning

  • Domain partitioning

  • Assume one task per particle

  • Task has particle’s position, velocity vector

  • Iteration

    • Get positions of all other particles

    • Compute new position, velocity






Communication time

Complete graph

Hypercube

Communication Time




Scatter in log p steps

1234

12

5678

56

34

78

Scatter in log p Steps

12345678


Summary task channel model
Summary: Task/channel Model

  • Parallel computation

    • Set of tasks

    • Interactions through channels

  • Good designs

    • Maximize local computations

    • Minimize communications

    • Scale up


Summary design steps
Summary: Design Steps

  • Partition computation

  • Agglomerate tasks

  • Map tasks to processors

  • Goals

    • Maximize processor utilization

    • Minimize inter-processor communication


Summary fundamental algorithms
Summary: Fundamental Algorithms

  • Reduction

  • Gather and scatter

  • All-gather