Cs 584
Download
1 / 64

CS 584 - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

CS 584. Designing Parallel Algorithms. Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit from a methodical approach. Framework for algorithm design

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CS 584' - clodia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Designing parallel algorithms
Designing Parallel Algorithms

  • Designing a parallel algorithm is not easy.

  • There is no recipe or magical ingredient

    • Except creativity

  • We can benefit from a methodical approach.

    • Framework for algorithm design

  • Most problems have several parallel solutions which may be totally different from the best sequential algorithm.


Pcam algorithm design
PCAM Algorithm Design

  • 4 Stages to designing a parallel algorithm

    • Partitioning

    • Communication

    • Agglomeration

    • Mapping

  • P & C focus on concurrency and scalability.

  • A & M focus on locality and performance.


Pcam algorithm design1
PCAM Algorithm Design

  • Partitioning

    • Computation and data are decomposed.

  • Communication

    • Coordinate task execution

  • Agglomeration

    • Combining of tasks for performance

  • Mapping

    • Assignment of tasks to processors


Partitioning
Partitioning

  • Ignore the number of processors and the target architecture.

  • Expose opportunities for parallelism.

  • Divide up both the computation and data

  • Can take two approaches

    • domain decomposition

    • functional decomposition


Domain decomposition
Domain Decomposition

  • Start algorithm design by analyzing the data

  • Divide the data into small pieces

    • Approximately equal in size

  • Then partition the computation by associating it with the data.

  • Communication issues may arise as one task needs the data from another task.


Domain decomposition1

1

4

ò

+

2

1

x

0

Domain Decomposition

  • Evaluate the definite integral.

0

1




Split up the domain

Now each

task simply

evaluates

the integral

in their range.

All that is

left is to

sum up each

task's answer

for the total.

0

0.25

0.5

0.75

1


Domain decomposition2
Domain Decomposition

  • Consider dividing up a 3-D grid

    • What issues arise?

  • Other issues?

    • What if your problem has more than one data structure?

    • Different problem phases?

    • Replication?


Functional decomposition
Functional Decomposition

  • Focus on the computation

  • Divide the computation into disjoint tasks

    • Avoid data dependency among tasks

  • After dividing the computation, examine the data requirements of each task.


Functional decomposition1
Functional Decomposition

  • Not as natural as domain decomposition

  • Consider search problems

  • Often functional decomposition is very useful at a higher level.

    • Climate modeling

      • Ocean simulation

      • Hydrology

      • Atmosphere, etc.


Partitioning checklist
Partitioning Checklist

  • Define a LOT of tasks?

  • Avoid redundant computation and storage?

  • Are tasks approximately equal?

  • Does the number of tasks scale with the problem size?

  • Have you identified several alternative partitioning schemes?


Communication
Communication

  • The information flow between tasks is specified in this stage of the design

  • Remember:

    • Tasks execute concurrently.

    • Data dependencies may limit concurrency.


Communication1
Communication

  • Define Channel

    • Link the producers with the consumers.

    • Consider the costs

      • Intellectual

      • Physical

    • Distribute the communication.

  • Specify the messages that are sent.


Communication patterns
Communication Patterns

  • Local vs. Global

  • Structured vs. Unstructured

  • Static vs. Dynamic

  • Synchronous vs. Asynchronous


Local communication
Local Communication

  • Communication within a neighborhood.

Algorithm choice determines communication.


Global communication
Global Communication

  • Not localized.

  • Examples

    • All-to-All

    • Master-Worker

1

5

2

3

7


Avoiding global communication
Avoiding Global Communication

  • Distribute the communication and computation

1

5

2

3

7

13

10

3

1

5

3

7

2

1


Divide and conquer

7

8

3

5

1

5

3

2

1

Divide and Conquer

  • Partition the problem into two or more subproblems

  • Partition each subproblem, etc.

Results in structured nearest neighbor communication pattern.


Structured communication
Structured Communication

  • Each task’s communication resembles each other task’s communication

  • Is there a pattern?


Unstructured communication
Unstructured Communication

  • No regular pattern that can be exploited.

  • Examples

    • Unstructured Grid

    • Resolution changes

Complicates the next stages of design


Synchronous communication
Synchronous Communication

  • Both consumers and producers are aware when communication is required

  • Explicit and simple

t = 1

t = 2

t = 3


Asynchronous communication
Asynchronous Communication

  • Timing of send/receive is unknown.

    • No pattern

  • Consider: very large data structure

    • Distribute among computational tasks (polling)

    • Define a set of read/write tasks

    • Shared Memory


Problems to avoid
Problems to Avoid

  • A centralized algorithm

    • Distribute the computation

    • Distribute the communication

  • A sequential algorithm

    • Seek for concurrency

    • Divide and conquer

      • Small, equal sized subproblems


Communication design checklist
Communication Design Checklist

  • Is communication balanced?

    • All tasks about the same

  • Is communication limited to neighborhoods?

    • Restructure global to local if possible.

  • Can communications proceed concurrently?

  • Can the algorithm proceed concurrently?

    • Find the algorithm with most concurrency.

      • Be careful!!!


Agglomeration
Agglomeration

  • Partition and Communication steps were abstract

  • Agglomeration moves to concrete.

  • Combine tasks to execute efficiently on some parallel computer.

  • Consider replication.


Agglomeration goals
Agglomeration Goals

  • Reduce communication costs by

    • increasing computation

    • decreasing/increasing granularity

  • Retain flexibility for mapping and scaling.

  • Reduce software engineering costs.


Changing granularity
Changing Granularity

  • A large number of tasks does not necessarily produce an efficient algorithm.

  • We must consider the communication costs.

  • Reduce communication by

    • having fewer tasks

    • sending less messages (batching)


Surface to volume effects
Surface to Volume Effects

  • Communication is proportional to the surface of the subdomain.

  • Computation is proportional to the volume of the subdomain.

  • Increasing computation will often decrease communication.


How many messages total?

How much data is sent?


How many messages total?

How much data is sent?


Replicating computation
Replicating Computation

  • Trade-off replicated computation for reduced communication.

  • Replication will often reduce execution time as well.


Summation of n integers
Summation of N Integers

s = sum

b = broadcast

How many steps?



Using replication
Using Replication

Butterfly to Hypercube


Avoid communication
Avoid Communication

  • Look for tasks that cannot execute concurrently because of communication requirements.

  • Replication can help accomplish two tasks at the same time, like:

    • Summation

    • Broadcast


Preserve flexibility
Preserve Flexibility

  • Create more tasks than processors.

  • Overlap communication and computation.

  • Don't incorporate unnecessary limits on the number of tasks.


Agglomeration checklist
Agglomeration Checklist

  • Reduce communication costs by increasing locality.

  • Do benefits of replication outweigh costs?

  • Does replication compromise scalability?

  • Does the number of tasks still scale with problem size?

  • Is there still sufficient concurrency?


Mapping
Mapping

  • Specify where each task is to operate.

  • Mapping may need to change depending on the target architecture.

  • Mapping is NP-complete.


Mapping1
Mapping

  • Goal: Reduce Execution Time

    • Concurrent tasks ---> Different processors

    • High communication ---> Same processor

  • Mapping is a game of trade-offs.


Mapping2
Mapping

  • Many domain-decomposition problems make mapping easy.

    • Grids

    • Arrays

    • etc.


Mapping3
Mapping

  • Unstructured or complex domain decomposition based algorithms are difficult to map.


Other mapping problems
Other Mapping Problems

  • Variable amounts of work per task

  • Unstructured communication

  • Heterogeneous processors

    • different speeds

    • different architectures

  • Solution: LOAD BALANCING


Load balancing
Load Balancing

  • Static

    • Determined a priori

    • Based on work, processor speed, etc.

  • Probabilistic

    • Random

  • Dynamic

    • Restructure load during execution

  • Task Scheduling (functional decomp.)


Static load balancing
Static Load Balancing

  • Based on a priori knowledge.

  • Goal: Equal WORK on all processors

  • Algorithms:

    • Basic

    • Recursive Bisection


Basic
Basic

  • Divide up the work based on

    • Work required

    • Processor speed

æ

ö

ç

÷

p

=

i

r

R

ç

÷

å

i

p

ç

÷

è

ø

i


Recursive bisection
Recursive Bisection

  • Divide work in half recursively.

  • Based on physical coordinates.


Dynamic algorithms
Dynamic Algorithms

  • Adjust load when an imbalance is detected.

  • Local or Global


Task scheduling
Task Scheduling

  • Many tasks with weak locality requirements.

  • Manager-Worker model.


Task scheduling1
Task Scheduling

  • Manager-Worker

  • Hierarchical Manager-Worker

    • Uses submanagers

  • Decentralized

    • No central manager

    • Task pool on each processor

    • Less bottleneck


Mapping checklist
Mapping Checklist

  • Is the load balanced?

  • Are there communication bottlenecks?

  • Is it necessary to adjust the load dynamically?

  • Can you adjust the load if necessary?

  • Have you evaluated the costs?


Pcam algorithm design2
PCAM Algorithm Design

  • Partition

    • Domain or Functional Decomposition

  • Communication

    • Link producers and consumers

  • Agglomeration

    • Combine tasks for efficiency

  • Mapping

    • Divide up the tasks for balanced execution


Example atmosphere model
Example: Atmosphere Model

  • Simulate atmospheric processes

    • Wind

    • Clouds, etc.

  • Solves a set of partial differential equations describing the fluid behavior








ad