cs 584
Download
Skip this Video
Download Presentation
CS 584

Loading in 2 Seconds...

play fullscreen
1 / 64

CS 584 - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

CS 584. Designing Parallel Algorithms. Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit from a methodical approach. Framework for algorithm design

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CS 584' - clodia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
designing parallel algorithms
Designing Parallel Algorithms
  • Designing a parallel algorithm is not easy.
  • There is no recipe or magical ingredient
    • Except creativity
  • We can benefit from a methodical approach.
    • Framework for algorithm design
  • Most problems have several parallel solutions which may be totally different from the best sequential algorithm.
pcam algorithm design
PCAM Algorithm Design
  • 4 Stages to designing a parallel algorithm
    • Partitioning
    • Communication
    • Agglomeration
    • Mapping
  • P & C focus on concurrency and scalability.
  • A & M focus on locality and performance.
pcam algorithm design1
PCAM Algorithm Design
  • Partitioning
    • Computation and data are decomposed.
  • Communication
    • Coordinate task execution
  • Agglomeration
    • Combining of tasks for performance
  • Mapping
    • Assignment of tasks to processors
partitioning
Partitioning
  • Ignore the number of processors and the target architecture.
  • Expose opportunities for parallelism.
  • Divide up both the computation and data
  • Can take two approaches
    • domain decomposition
    • functional decomposition
domain decomposition
Domain Decomposition
  • Start algorithm design by analyzing the data
  • Divide the data into small pieces
    • Approximately equal in size
  • Then partition the computation by associating it with the data.
  • Communication issues may arise as one task needs the data from another task.
domain decomposition1

1

4

ò

+

2

1

x

0

Domain Decomposition
  • Evaluate the definite integral.

0

1

slide11

Split up the domain

Now each

task simply

evaluates

the integral

in their range.

All that is

left is to

sum up each

task\'s answer

for the total.

0

0.25

0.5

0.75

1

domain decomposition2
Domain Decomposition
  • Consider dividing up a 3-D grid
    • What issues arise?
  • Other issues?
    • What if your problem has more than one data structure?
    • Different problem phases?
    • Replication?
functional decomposition
Functional Decomposition
  • Focus on the computation
  • Divide the computation into disjoint tasks
    • Avoid data dependency among tasks
  • After dividing the computation, examine the data requirements of each task.
functional decomposition1
Functional Decomposition
  • Not as natural as domain decomposition
  • Consider search problems
  • Often functional decomposition is very useful at a higher level.
    • Climate modeling
      • Ocean simulation
      • Hydrology
      • Atmosphere, etc.
partitioning checklist
Partitioning Checklist
  • Define a LOT of tasks?
  • Avoid redundant computation and storage?
  • Are tasks approximately equal?
  • Does the number of tasks scale with the problem size?
  • Have you identified several alternative partitioning schemes?
communication
Communication
  • The information flow between tasks is specified in this stage of the design
  • Remember:
    • Tasks execute concurrently.
    • Data dependencies may limit concurrency.
communication1
Communication
  • Define Channel
    • Link the producers with the consumers.
    • Consider the costs
      • Intellectual
      • Physical
    • Distribute the communication.
  • Specify the messages that are sent.
communication patterns
Communication Patterns
  • Local vs. Global
  • Structured vs. Unstructured
  • Static vs. Dynamic
  • Synchronous vs. Asynchronous
local communication
Local Communication
  • Communication within a neighborhood.

Algorithm choice determines communication.

global communication
Global Communication
  • Not localized.
  • Examples
    • All-to-All
    • Master-Worker

1

5

2

3

7

avoiding global communication
Avoiding Global Communication
  • Distribute the communication and computation

1

5

2

3

7

13

10

3

1

5

3

7

2

1

divide and conquer

7

8

3

5

1

5

3

2

1

Divide and Conquer
  • Partition the problem into two or more subproblems
  • Partition each subproblem, etc.

Results in structured nearest neighbor communication pattern.

structured communication
Structured Communication
  • Each task’s communication resembles each other task’s communication
  • Is there a pattern?
unstructured communication
Unstructured Communication
  • No regular pattern that can be exploited.
  • Examples
    • Unstructured Grid
    • Resolution changes

Complicates the next stages of design

synchronous communication
Synchronous Communication
  • Both consumers and producers are aware when communication is required
  • Explicit and simple

t = 1

t = 2

t = 3

asynchronous communication
Asynchronous Communication
  • Timing of send/receive is unknown.
    • No pattern
  • Consider: very large data structure
    • Distribute among computational tasks (polling)
    • Define a set of read/write tasks
    • Shared Memory
problems to avoid
Problems to Avoid
  • A centralized algorithm
    • Distribute the computation
    • Distribute the communication
  • A sequential algorithm
    • Seek for concurrency
    • Divide and conquer
      • Small, equal sized subproblems
communication design checklist
Communication Design Checklist
  • Is communication balanced?
    • All tasks about the same
  • Is communication limited to neighborhoods?
    • Restructure global to local if possible.
  • Can communications proceed concurrently?
  • Can the algorithm proceed concurrently?
    • Find the algorithm with most concurrency.
      • Be careful!!!
agglomeration
Agglomeration
  • Partition and Communication steps were abstract
  • Agglomeration moves to concrete.
  • Combine tasks to execute efficiently on some parallel computer.
  • Consider replication.
agglomeration goals
Agglomeration Goals
  • Reduce communication costs by
    • increasing computation
    • decreasing/increasing granularity
  • Retain flexibility for mapping and scaling.
  • Reduce software engineering costs.
changing granularity
Changing Granularity
  • A large number of tasks does not necessarily produce an efficient algorithm.
  • We must consider the communication costs.
  • Reduce communication by
    • having fewer tasks
    • sending less messages (batching)
surface to volume effects
Surface to Volume Effects
  • Communication is proportional to the surface of the subdomain.
  • Computation is proportional to the volume of the subdomain.
  • Increasing computation will often decrease communication.
slide35

How many messages total?

How much data is sent?

slide36

How many messages total?

How much data is sent?

replicating computation
Replicating Computation
  • Trade-off replicated computation for reduced communication.
  • Replication will often reduce execution time as well.
summation of n integers
Summation of N Integers

s = sum

b = broadcast

How many steps?

using replication
Using Replication

Butterfly to Hypercube

avoid communication
Avoid Communication
  • Look for tasks that cannot execute concurrently because of communication requirements.
  • Replication can help accomplish two tasks at the same time, like:
    • Summation
    • Broadcast
preserve flexibility
Preserve Flexibility
  • Create more tasks than processors.
  • Overlap communication and computation.
  • Don\'t incorporate unnecessary limits on the number of tasks.
agglomeration checklist
Agglomeration Checklist
  • Reduce communication costs by increasing locality.
  • Do benefits of replication outweigh costs?
  • Does replication compromise scalability?
  • Does the number of tasks still scale with problem size?
  • Is there still sufficient concurrency?
mapping
Mapping
  • Specify where each task is to operate.
  • Mapping may need to change depending on the target architecture.
  • Mapping is NP-complete.
mapping1
Mapping
  • Goal: Reduce Execution Time
    • Concurrent tasks ---> Different processors
    • High communication ---> Same processor
  • Mapping is a game of trade-offs.
mapping2
Mapping
  • Many domain-decomposition problems make mapping easy.
    • Grids
    • Arrays
    • etc.
mapping3
Mapping
  • Unstructured or complex domain decomposition based algorithms are difficult to map.
other mapping problems
Other Mapping Problems
  • Variable amounts of work per task
  • Unstructured communication
  • Heterogeneous processors
    • different speeds
    • different architectures
  • Solution: LOAD BALANCING
load balancing
Load Balancing
  • Static
    • Determined a priori
    • Based on work, processor speed, etc.
  • Probabilistic
    • Random
  • Dynamic
    • Restructure load during execution
  • Task Scheduling (functional decomp.)
static load balancing
Static Load Balancing
  • Based on a priori knowledge.
  • Goal: Equal WORK on all processors
  • Algorithms:
    • Basic
    • Recursive Bisection
basic
Basic
  • Divide up the work based on
    • Work required
    • Processor speed

æ

ö

ç

÷

p

=

i

r

R

ç

÷

å

i

p

ç

÷

è

ø

i

recursive bisection
Recursive Bisection
  • Divide work in half recursively.
  • Based on physical coordinates.
dynamic algorithms
Dynamic Algorithms
  • Adjust load when an imbalance is detected.
  • Local or Global
task scheduling
Task Scheduling
  • Many tasks with weak locality requirements.
  • Manager-Worker model.
task scheduling1
Task Scheduling
  • Manager-Worker
  • Hierarchical Manager-Worker
    • Uses submanagers
  • Decentralized
    • No central manager
    • Task pool on each processor
    • Less bottleneck
mapping checklist
Mapping Checklist
  • Is the load balanced?
  • Are there communication bottlenecks?
  • Is it necessary to adjust the load dynamically?
  • Can you adjust the load if necessary?
  • Have you evaluated the costs?
pcam algorithm design2
PCAM Algorithm Design
  • Partition
    • Domain or Functional Decomposition
  • Communication
    • Link producers and consumers
  • Agglomeration
    • Combine tasks for efficiency
  • Mapping
    • Divide up the tasks for balanced execution
example atmosphere model
Example: Atmosphere Model
  • Simulate atmospheric processes
    • Wind
    • Clouds, etc.
  • Solves a set of partial differential equations describing the fluid behavior
ad