Advanced algorithms
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

Advanced Algorithms PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

Advanced Algorithms. Piyush Kumar (Lecture 12: Parallel Algorithms). Courtesy Baker 05. Welcome to COT5405. Parallel Models. An abstract description of a real world parallel machine. Attempts to capture essential features (and suppress details?) What other models have we seen so far?.

Download Presentation

Advanced Algorithms

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Advanced algorithms

Advanced Algorithms

Piyush Kumar

(Lecture 12: Parallel Algorithms)

Courtesy Baker 05.

Welcome to COT5405


Parallel models

Parallel Models

  • An abstract description of a real world parallel machine.

  • Attempts to capture essential features (and suppress details?)

  • What other models have we seen so far?

RAM?

External Memory Model?


Advanced algorithms 1367882

RAM

  • Random Access Machine Model

    • Memory is a sequence of bits/words.

    • Each memory access takes O(1) time.

    • Basic operations take O(1) time: Add/Mul/Xor/Sub/AND/not…

    • Instructions can not be modified.

    • No consideration of memory hierarchies.

    • Has been very successful in modelling real world machines.


Parallel ram aka pram

Parallel RAM aka PRAM

  • Generalization of RAM

  • P processors with their own programs (and unique id)

  • MIMD processors : At each point in time the processors might be executing different instructions on different data.

  • Shared Memory

  • Instructions are synchronized among the processors


Advanced algorithms 1367882

PRAM

Shared Memory

EREW/ERCW/CREW/CRCW

EREW: A program isnt allowed to access the same memory location

at the same time.


Variants of crcw

Variants of CRCW

  • Common CRCW: CW iff processors write same value.

  • Arbitrary CRCW

  • Priority CRCW

  • Combining CRCW


Why pram

Why PRAM?

  • Lot of literature available on algorithms for PRAM.

  • One of the most “clean” models.

  • Focuses on what communication is needed ( and ignores the cost/means to do it)


Pram algorithm design

PRAM Algorithm design.

  • Problem 1: Produce the sum of an array of n numbers.

  • RAM = ?

  • PRAM = ?


Problem 2 prefix computation

1st prefix

2nd prefix

3rd prefix

...

(n-1)th prefix

s0

s0Ä s1

s0Ä s1Ä s2

...

s0Ä s1Ä ... Ä sn-1

Problem 2: Prefix Computation

Let X = {s0, s1, …, sn-1} be in a set S

Let Ä be a binary, associative, closed operator with respect to S

(usually Q(1) time – MIN, MAX, AND, +, ...)

The result of s0Ä s1Ä…Ä sk is called the k-th prefix

Computing all such n prefixes is the parallel prefix computation


Prefix computation

Prefix computation

  • Suffix computation is a similar problem.

  • Assumes Binary op takes O(1)

  • In RAM = ?


Prefix computation akl

Prefix Computation (Akl)


Erew pram prefix computation

EREW PRAM Prefix computation

  • Assume PRAM has n processors and n is a power of 2.

  • Input: si for i = 0,1, ... , n-1.

  • Algorithm Steps:

    forj = 0 to (lg n) -1, do

    for i = 2j to n-1 do

    h = i - 2j

    si= shÄsi

    endfor

    endfor

Total time in EREW PRAM?


Problem 3 array packing

Problem 3: Array packing

  • Assume that we have

    • an array of n elements, X = {x1, x2, ... , xn}

    • Some array elements are marked (or distinguished).

  • The requirements of this problem are to

    • pack the marked elements in the front part of the array.

    • place the remaining elements in the back of the array.

  • While not a requirement, it is also desirable to

    • maintain the original order between the marked elements

    • maintain the original order between the unmarked elements


In ram

In RAM?

  • How would you do this?

  • Inplace?

  • Running time?

  • Any ideas on how to do this in PRAM?


Erew pram algorithm

EREW PRAM Algorithm

  • Set si in Pi to 1 if xi is marked and set si = 0 otherwise.

    2. Perform a prefix sum on S =(s1, s2 ,..., sn) to obtain destination di = si for each marked xi .

    3. All PEs set m = sn , the total nr of marked elements.

    4. Pi sets si to 0 if xi is marked and otherwise sets si = 1.

    5. Perform a prefix sum on S and set di = si + m for each unmarked xi .

    6. Each Pi copies array element xi into address di in X.


Array packing

Array Packing

  • Assume n processors are used above.

  • Optimal prefix sums requires O(lg n) time.

  • The EREW broadcast of sn needed in Step 3 takes O(lg n) time using a binary tree in memory

  • All and other steps require constant time.

  • Runs in O(lg n) time and is cost optimal.

  • Maintains original order in unmarked group as well

    Notes:

  • Algorithm illustrates usefulness of Prefix Sums

  • There many applications for Array Packing algorithm


Problem 4 pram mergesort

Problem 4: PRAM MergeSort

  • RAM Merge Sort Recursion?

  • PRAM Merge Sort recursion?

  • Can we speed up the merging?

    • Merging n elements with n processors can be done in O(log n) time.

    • Assume all elements are distinct

    • Rank(a, A) = number of elements in A smaller than a. For example rank(8, {1,3,5,7,9}) = 4


Pram merging

1

2

3

8

10

12

14

15

16

19

PRAM Merging

A = 2,3,10,15,16

B = 1,8,12,14,19

Rank(2)=1 Rank(3)=1 Rank(10)=2 Rank(15)=4 Rank(16)=4

+1 +2 +3 +4 +5

+1 +2 +3 +4 +5

Rank(1)=0 Rank(8)=2 Rank(12)=3 Rank(14)=3 Rank(19)=5


Pram merge sort

PRAM Merge Sort

  • T(n) = T(n/2) + O(log n)

  • Using the idea of pipelined d&c PRAM Mergesort can be done in O(log n).

  • D&C is one of the most powerful techniques to solve problems in parallel.


Problem 5 closest pair

21

12

Problem 5: Closest Pair

  • RAM Version ?

L

7

6

5

4

 = min(12, 21)

3

2

1


Closest pair ram version

Closest Pair: RAM Version

Closest-Pair(p1, …, pn) {

Compute separation line L such that half the points are on one side and half on the other side.

1 = Closest-Pair(left half)

2 = Closest-Pair(right half)

= min(1, 2)

Delete all points further than  from separation line L

Sort remaining points by y-coordinate.

Scan points in y-order and compare distance between each point and next 11 neighbors. If any of these distances is less than , update .

return.

}

O(n log n)

2T(n / 2)

O(n)

O(n log n)

O(n)


Closest pair pram version

Closest Pair: PRAM Version?

Closest-Pair(p1, …, pn) {

Compute separation line L such that half the points are on one side and half on the other side.

1 = Closest-Pair(left half)

2 = Closest-Pair(right half)

= min(1, 2)

Delete all points further than  from separation line L

Sort remaining points by y-coordinate.

Scan points in y-order and compare distance between each point and next 11 neighbors.

Find min of all these distances, update .

return.

}

O(1)

Use sorted lists

In parallel

T(n / 2)

Use presorting and prefix computation.

O(log n)

O(1)

O(log n)

Again use prefix computation.

Recurrence : T(n) = T(n/2) + O(log n)


Problem 6 planar convex hulls

Problem 6: Planar Convex hulls

MergeHull (P)

  • HL = MergeHull( Left of median)

  • HR = MergeHull( Right of median)

  • Return JoinHulls(HL,HR)

Time complexity in RAM?

Time complexity in PRAM?


Join hulls

Join_Hulls


Towards a better planar convex hull

Towards a betterPlanar Convex hull

  • Let Q = {q1, q2, . . . , qn} be a set of points in the Euclidean plane (i.e., E2-space).

  • The convex hull of Q is denoted by CH(Q) and is the smallest convex polygon containing Q.

    • It is specified by listing its corner points (which are from Q) in order (e.g., clockwise order).

  • Usual Computational Geometry Assumptions:

    • No three points lie on the same straight line.

    • No two points have the same x or y coordinate.

    • There are at least 4 points, as CH(Q) = Q for n  3.


Pram convex hull n q ch q

PRAM CONVEX HULL(n,Q, CH(Q))

  • Sort the points of Q by x-coordinate.

  • Partition Q into k =n subsets Q1,Q2,. . . ,Qkof k points each such that a vertical line can separate Qi from Qj

    • Also, if i <j, then Qi is left of Qj.

  • For i = 1 to k , compute the convex hulls of Qi in parallel, as follows:

    • if |Qi|  3, then CH(Qi) = Qi

    • else (using k=n PEs) call PRAM CONVEX HULL(k, Qi, CH(Qi))

  • Merge the convex hulls in {CH(Q1),CH(Q2), . . . ,CH(Qk)}together.


Basic idea

Basic Idea


Last step

Last Step

  • The upper hull is found first. Then, the lower hull is found next using the same method.

    • Only finding the upper hull is described here

    • Upper & lower convex hull points merged into ordered set

  • Each CH(Qi) has n PEs assigned to it.

  • The PEs assigned to CH(Qi) (in parallel) compute the upper tangent from CH(Qi) to another CH(Qj) .

    • A total of n-1 tangents are computed for each CH(Qi)

    • Details for computing the upper tangents will be separately


Last step1

Last Step

  • Among the tangent lines to CH(Qi) , and polygons to the left of CH(Qi), let Libe the one with the smallest slope.

  • Among the tangent lines to CH(Qi) and polygons to the right, let Ribe the one with the largest slope.

  • If the angle between Liand Riis less than 180 degrees, no point of CH(Qi) is in CH(Q).

    • See Figure 5.13 on next slide (from Akl’s Online text)

    • Otherwise, all points in CH(Q)between where Litouches CH(Qi) and where Ritouches CH(Qi) are in CH(Q).

  • Array Packing is used to combine all convex hull points of CH(Q) after they are identified.


Complexity

Complexity

  • Step 1: The sort takes O(lg n) time.

  • Step 2: Partition of Q into subsets takes O(1) time.

  • Step 3: The recursive calculations of CH(Qi) for 1  i n in parallel takes t(n) time (using n PEs for each Qi).

  • Step 4: The big steps here require O(lgn)and are

    • Finding the upper tangent from CH(Qi) to CH(Qj) for each i, j pair.

    • Array packing used to form the ordered sequence of upper convex hull points for Q.

  • Above steps find the upper convex hull. The lower convex hull is found similarly.

    • Upper & lower hulls merged in O(1) time to ordered set


Complexity1

Complexity

  • Cost for Step 3: Solving the recurrance relation

    t(n) = t(n) + lg n

    yields

    t(n) = O(lg n)

  • Running time for PRAM Convex Hull is O(lg n) since this is maximum cost for each step.

  • Then the cost for PRAM Convex Hull is

    C(n) = O(n lg n).


  • Login