- 97 Views
- Uploaded on
- Presentation posted in: General

Advanced Algorithms

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Advanced Algorithms

Piyush Kumar

(Lecture 12: Parallel Algorithms)

Courtesy Baker 05.

Welcome to COT5405

- An abstract description of a real world parallel machine.
- Attempts to capture essential features (and suppress details?)
- What other models have we seen so far?

RAM?

External Memory Model?

- Random Access Machine Model
- Memory is a sequence of bits/words.
- Each memory access takes O(1) time.
- Basic operations take O(1) time: Add/Mul/Xor/Sub/AND/not…
- Instructions can not be modified.
- No consideration of memory hierarchies.
- Has been very successful in modelling real world machines.

- Generalization of RAM
- P processors with their own programs (and unique id)
- MIMD processors : At each point in time the processors might be executing different instructions on different data.
- Shared Memory
- Instructions are synchronized among the processors

Shared Memory

EREW/ERCW/CREW/CRCW

EREW: A program isnt allowed to access the same memory location

at the same time.

- Common CRCW: CW iff processors write same value.
- Arbitrary CRCW
- Priority CRCW
- Combining CRCW

- Lot of literature available on algorithms for PRAM.
- One of the most “clean” models.
- Focuses on what communication is needed ( and ignores the cost/means to do it)

- Problem 1: Produce the sum of an array of n numbers.
- RAM = ?
- PRAM = ?

1st prefix

2nd prefix

3rd prefix

...

(n-1)th prefix

s0

s0Ä s1

s0Ä s1Ä s2

...

s0Ä s1Ä ... Ä sn-1

Let X = {s0, s1, …, sn-1} be in a set S

Let Ä be a binary, associative, closed operator with respect to S

(usually Q(1) time – MIN, MAX, AND, +, ...)

The result of s0Ä s1Ä…Ä sk is called the k-th prefix

Computing all such n prefixes is the parallel prefix computation

- Suffix computation is a similar problem.
- Assumes Binary op takes O(1)
- In RAM = ?

- Assume PRAM has n processors and n is a power of 2.
- Input: si for i = 0,1, ... , n-1.
- Algorithm Steps:
forj = 0 to (lg n) -1, do

for i = 2j to n-1 do

h = i - 2j

si= shÄsi

endfor

endfor

Total time in EREW PRAM?

- Assume that we have
- an array of n elements, X = {x1, x2, ... , xn}
- Some array elements are marked (or distinguished).

- The requirements of this problem are to
- pack the marked elements in the front part of the array.
- place the remaining elements in the back of the array.

- While not a requirement, it is also desirable to
- maintain the original order between the marked elements
- maintain the original order between the unmarked elements

- How would you do this?
- Inplace?
- Running time?
- Any ideas on how to do this in PRAM?

- Set si in Pi to 1 if xi is marked and set si = 0 otherwise.
2. Perform a prefix sum on S =(s1, s2 ,..., sn) to obtain destination di = si for each marked xi .

3. All PEs set m = sn , the total nr of marked elements.

4. Pi sets si to 0 if xi is marked and otherwise sets si = 1.

5. Perform a prefix sum on S and set di = si + m for each unmarked xi .

6. Each Pi copies array element xi into address di in X.

- Assume n processors are used above.
- Optimal prefix sums requires O(lg n) time.
- The EREW broadcast of sn needed in Step 3 takes O(lg n) time using a binary tree in memory
- All and other steps require constant time.
- Runs in O(lg n) time and is cost optimal.
- Maintains original order in unmarked group as well
Notes:

- Algorithm illustrates usefulness of Prefix Sums
- There many applications for Array Packing algorithm

- RAM Merge Sort Recursion?
- PRAM Merge Sort recursion?
- Can we speed up the merging?
- Merging n elements with n processors can be done in O(log n) time.
- Assume all elements are distinct
- Rank(a, A) = number of elements in A smaller than a. For example rank(8, {1,3,5,7,9}) = 4

1

2

3

8

10

12

14

15

16

19

A = 2,3,10,15,16

B = 1,8,12,14,19

Rank(2)=1 Rank(3)=1 Rank(10)=2 Rank(15)=4 Rank(16)=4

+1 +2 +3 +4 +5

+1 +2 +3 +4 +5

Rank(1)=0 Rank(8)=2 Rank(12)=3 Rank(14)=3 Rank(19)=5

- T(n) = T(n/2) + O(log n)
- Using the idea of pipelined d&c PRAM Mergesort can be done in O(log n).
- D&C is one of the most powerful techniques to solve problems in parallel.

21

12

- RAM Version ?

L

7

6

5

4

= min(12, 21)

3

2

1

Closest-Pair(p1, …, pn) {

Compute separation line L such that half the points are on one side and half on the other side.

1 = Closest-Pair(left half)

2 = Closest-Pair(right half)

= min(1, 2)

Delete all points further than from separation line L

Sort remaining points by y-coordinate.

Scan points in y-order and compare distance between each point and next 11 neighbors. If any of these distances is less than , update .

return.

}

O(n log n)

2T(n / 2)

O(n)

O(n log n)

O(n)

Closest-Pair(p1, …, pn) {

Compute separation line L such that half the points are on one side and half on the other side.

1 = Closest-Pair(left half)

2 = Closest-Pair(right half)

= min(1, 2)

Delete all points further than from separation line L

Sort remaining points by y-coordinate.

Scan points in y-order and compare distance between each point and next 11 neighbors.

Find min of all these distances, update .

return.

}

O(1)

Use sorted lists

In parallel

T(n / 2)

Use presorting and prefix computation.

O(log n)

O(1)

O(log n)

Again use prefix computation.

Recurrence : T(n) = T(n/2) + O(log n)

MergeHull (P)

- HL = MergeHull( Left of median)
- HR = MergeHull( Right of median)
- Return JoinHulls(HL,HR)

Time complexity in RAM?

Time complexity in PRAM?

- Let Q = {q1, q2, . . . , qn} be a set of points in the Euclidean plane (i.e., E2-space).
- The convex hull of Q is denoted by CH(Q) and is the smallest convex polygon containing Q.
- It is specified by listing its corner points (which are from Q) in order (e.g., clockwise order).

- Usual Computational Geometry Assumptions:
- No three points lie on the same straight line.
- No two points have the same x or y coordinate.
- There are at least 4 points, as CH(Q) = Q for n 3.

- Sort the points of Q by x-coordinate.
- Partition Q into k =n subsets Q1,Q2,. . . ,Qkof k points each such that a vertical line can separate Qi from Qj
- Also, if i <j, then Qi is left of Qj.

- For i = 1 to k , compute the convex hulls of Qi in parallel, as follows:
- if |Qi| 3, then CH(Qi) = Qi
- else (using k=n PEs) call PRAM CONVEX HULL(k, Qi, CH(Qi))

- Merge the convex hulls in {CH(Q1),CH(Q2), . . . ,CH(Qk)}together.

- The upper hull is found first. Then, the lower hull is found next using the same method.
- Only finding the upper hull is described here
- Upper & lower convex hull points merged into ordered set

- Each CH(Qi) has n PEs assigned to it.
- The PEs assigned to CH(Qi) (in parallel) compute the upper tangent from CH(Qi) to another CH(Qj) .
- A total of n-1 tangents are computed for each CH(Qi)
- Details for computing the upper tangents will be separately

- Among the tangent lines to CH(Qi) , and polygons to the left of CH(Qi), let Libe the one with the smallest slope.
- Among the tangent lines to CH(Qi) and polygons to the right, let Ribe the one with the largest slope.
- If the angle between Liand Riis less than 180 degrees, no point of CH(Qi) is in CH(Q).
- See Figure 5.13 on next slide (from Akl’s Online text)
- Otherwise, all points in CH(Q)between where Litouches CH(Qi) and where Ritouches CH(Qi) are in CH(Q).

- Array Packing is used to combine all convex hull points of CH(Q) after they are identified.

- Step 1: The sort takes O(lg n) time.
- Step 2: Partition of Q into subsets takes O(1) time.
- Step 3: The recursive calculations of CH(Qi) for 1 i n in parallel takes t(n) time (using n PEs for each Qi).
- Step 4: The big steps here require O(lgn)and are
- Finding the upper tangent from CH(Qi) to CH(Qj) for each i, j pair.
- Array packing used to form the ordered sequence of upper convex hull points for Q.

- Above steps find the upper convex hull. The lower convex hull is found similarly.
- Upper & lower hulls merged in O(1) time to ordered set

- Cost for Step 3: Solving the recurrance relation
t(n) = t(n) + lg n

yields

t(n) = O(lg n)

- Running time for PRAM Convex Hull is O(lg n) since this is maximum cost for each step.
- Then the cost for PRAM Convex Hull is
C(n) = O(n lg n).