Cs 584
This presentation is the property of its rightful owner.
Sponsored Links
1 / 64

CS 584 PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

CS 584. Designing Parallel Algorithms. Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit from a methodical approach. Framework for algorithm design

Download Presentation

CS 584

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cs 584

CS 584


Designing parallel algorithms

Designing Parallel Algorithms

  • Designing a parallel algorithm is not easy.

  • There is no recipe or magical ingredient

    • Except creativity

  • We can benefit from a methodical approach.

    • Framework for algorithm design

  • Most problems have several parallel solutions which may be totally different from the best sequential algorithm.


Pcam algorithm design

PCAM Algorithm Design

  • 4 Stages to designing a parallel algorithm

    • Partitioning

    • Communication

    • Agglomeration

    • Mapping

  • P & C focus on concurrency and scalability.

  • A & M focus on locality and performance.


Pcam algorithm design1

PCAM Algorithm Design

  • Partitioning

    • Computation and data are decomposed.

  • Communication

    • Coordinate task execution

  • Agglomeration

    • Combining of tasks for performance

  • Mapping

    • Assignment of tasks to processors


Partitioning

Partitioning

  • Ignore the number of processors and the target architecture.

  • Expose opportunities for parallelism.

  • Divide up both the computation and data

  • Can take two approaches

    • domain decomposition

    • functional decomposition


Domain decomposition

Domain Decomposition

  • Start algorithm design by analyzing the data

  • Divide the data into small pieces

    • Approximately equal in size

  • Then partition the computation by associating it with the data.

  • Communication issues may arise as one task needs the data from another task.


Domain decomposition1

1

4

ò

+

2

1

x

0

Domain Decomposition

  • Evaluate the definite integral.

0

1


Split up the domain

Split up the domain

0

1


Cs 584

Split up the domain

0

1


Cs 584

Split up the domain

Now each

task simply

evaluates

the integral

in their range.

All that is

left is to

sum up each

task's answer

for the total.

0

0.25

0.5

0.75

1


Domain decomposition2

Domain Decomposition

  • Consider dividing up a 3-D grid

    • What issues arise?

  • Other issues?

    • What if your problem has more than one data structure?

    • Different problem phases?

    • Replication?


Functional decomposition

Functional Decomposition

  • Focus on the computation

  • Divide the computation into disjoint tasks

    • Avoid data dependency among tasks

  • After dividing the computation, examine the data requirements of each task.


Functional decomposition1

Functional Decomposition

  • Not as natural as domain decomposition

  • Consider search problems

  • Often functional decomposition is very useful at a higher level.

    • Climate modeling

      • Ocean simulation

      • Hydrology

      • Atmosphere, etc.


Partitioning checklist

Partitioning Checklist

  • Define a LOT of tasks?

  • Avoid redundant computation and storage?

  • Are tasks approximately equal?

  • Does the number of tasks scale with the problem size?

  • Have you identified several alternative partitioning schemes?


Communication

Communication

  • The information flow between tasks is specified in this stage of the design

  • Remember:

    • Tasks execute concurrently.

    • Data dependencies may limit concurrency.


Communication1

Communication

  • Define Channel

    • Link the producers with the consumers.

    • Consider the costs

      • Intellectual

      • Physical

    • Distribute the communication.

  • Specify the messages that are sent.


Communication patterns

Communication Patterns

  • Local vs. Global

  • Structured vs. Unstructured

  • Static vs. Dynamic

  • Synchronous vs. Asynchronous


Local communication

Local Communication

  • Communication within a neighborhood.

Algorithm choice determines communication.


Global communication

Global Communication

  • Not localized.

  • Examples

    • All-to-All

    • Master-Worker

1

5

2

3

7


Avoiding global communication

Avoiding Global Communication

  • Distribute the communication and computation

1

5

2

3

7

13

10

3

1

5

3

7

2

1


Divide and conquer

7

8

3

5

1

5

3

2

1

Divide and Conquer

  • Partition the problem into two or more subproblems

  • Partition each subproblem, etc.

Results in structured nearest neighbor communication pattern.


Structured communication

Structured Communication

  • Each task’s communication resembles each other task’s communication

  • Is there a pattern?


Unstructured communication

Unstructured Communication

  • No regular pattern that can be exploited.

  • Examples

    • Unstructured Grid

    • Resolution changes

Complicates the next stages of design


Synchronous communication

Synchronous Communication

  • Both consumers and producers are aware when communication is required

  • Explicit and simple

t = 1

t = 2

t = 3


Asynchronous communication

Asynchronous Communication

  • Timing of send/receive is unknown.

    • No pattern

  • Consider: very large data structure

    • Distribute among computational tasks (polling)

    • Define a set of read/write tasks

    • Shared Memory


Problems to avoid

Problems to Avoid

  • A centralized algorithm

    • Distribute the computation

    • Distribute the communication

  • A sequential algorithm

    • Seek for concurrency

    • Divide and conquer

      • Small, equal sized subproblems


Communication design checklist

Communication Design Checklist

  • Is communication balanced?

    • All tasks about the same

  • Is communication limited to neighborhoods?

    • Restructure global to local if possible.

  • Can communications proceed concurrently?

  • Can the algorithm proceed concurrently?

    • Find the algorithm with most concurrency.

      • Be careful!!!


Agglomeration

Agglomeration

  • Partition and Communication steps were abstract

  • Agglomeration moves to concrete.

  • Combine tasks to execute efficiently on some parallel computer.

  • Consider replication.


Agglomeration goals

Agglomeration Goals

  • Reduce communication costs by

    • increasing computation

    • decreasing/increasing granularity

  • Retain flexibility for mapping and scaling.

  • Reduce software engineering costs.


Changing granularity

Changing Granularity

  • A large number of tasks does not necessarily produce an efficient algorithm.

  • We must consider the communication costs.

  • Reduce communication by

    • having fewer tasks

    • sending less messages (batching)


Surface to volume effects

Surface to Volume Effects

  • Communication is proportional to the surface of the subdomain.

  • Computation is proportional to the volume of the subdomain.

  • Increasing computation will often decrease communication.


Cs 584

How many messages total?

How much data is sent?


Cs 584

How many messages total?

How much data is sent?


Replicating computation

Replicating Computation

  • Trade-off replicated computation for reduced communication.

  • Replication will often reduce execution time as well.


Summation of n integers

Summation of N Integers

s = sum

b = broadcast

How many steps?


Using replication butterfly

Using Replication (Butterfly)


Using replication

Using Replication

Butterfly to Hypercube


Avoid communication

Avoid Communication

  • Look for tasks that cannot execute concurrently because of communication requirements.

  • Replication can help accomplish two tasks at the same time, like:

    • Summation

    • Broadcast


Preserve flexibility

Preserve Flexibility

  • Create more tasks than processors.

  • Overlap communication and computation.

  • Don't incorporate unnecessary limits on the number of tasks.


Agglomeration checklist

Agglomeration Checklist

  • Reduce communication costs by increasing locality.

  • Do benefits of replication outweigh costs?

  • Does replication compromise scalability?

  • Does the number of tasks still scale with problem size?

  • Is there still sufficient concurrency?


Mapping

Mapping

  • Specify where each task is to operate.

  • Mapping may need to change depending on the target architecture.

  • Mapping is NP-complete.


Mapping1

Mapping

  • Goal: Reduce Execution Time

    • Concurrent tasks ---> Different processors

    • High communication ---> Same processor

  • Mapping is a game of trade-offs.


Mapping2

Mapping

  • Many domain-decomposition problems make mapping easy.

    • Grids

    • Arrays

    • etc.


Mapping3

Mapping

  • Unstructured or complex domain decomposition based algorithms are difficult to map.


Other mapping problems

Other Mapping Problems

  • Variable amounts of work per task

  • Unstructured communication

  • Heterogeneous processors

    • different speeds

    • different architectures

  • Solution: LOAD BALANCING


Load balancing

Load Balancing

  • Static

    • Determined a priori

    • Based on work, processor speed, etc.

  • Probabilistic

    • Random

  • Dynamic

    • Restructure load during execution

  • Task Scheduling (functional decomp.)


Static load balancing

Static Load Balancing

  • Based on a priori knowledge.

  • Goal: Equal WORK on all processors

  • Algorithms:

    • Basic

    • Recursive Bisection


Basic

Basic

  • Divide up the work based on

    • Work required

    • Processor speed

æ

ö

ç

÷

p

=

i

r

R

ç

÷

å

i

p

ç

÷

è

ø

i


Recursive bisection

Recursive Bisection

  • Divide work in half recursively.

  • Based on physical coordinates.


Dynamic algorithms

Dynamic Algorithms

  • Adjust load when an imbalance is detected.

  • Local or Global


Task scheduling

Task Scheduling

  • Many tasks with weak locality requirements.

  • Manager-Worker model.


Task scheduling1

Task Scheduling

  • Manager-Worker

  • Hierarchical Manager-Worker

    • Uses submanagers

  • Decentralized

    • No central manager

    • Task pool on each processor

    • Less bottleneck


Mapping checklist

Mapping Checklist

  • Is the load balanced?

  • Are there communication bottlenecks?

  • Is it necessary to adjust the load dynamically?

  • Can you adjust the load if necessary?

  • Have you evaluated the costs?


Pcam algorithm design2

PCAM Algorithm Design

  • Partition

    • Domain or Functional Decomposition

  • Communication

    • Link producers and consumers

  • Agglomeration

    • Combine tasks for efficiency

  • Mapping

    • Divide up the tasks for balanced execution


Example atmosphere model

Example: Atmosphere Model

  • Simulate atmospheric processes

    • Wind

    • Clouds, etc.

  • Solves a set of partial differential equations describing the fluid behavior


Representation of atmosphere

Representation of Atmosphere


Data dependencies

Data Dependencies


Partition communication

Partition & Communication


Agglomeration1

Agglomeration


Mapping4

Mapping


Mapping5

Mapping


  • Login