1 / 14

On-line adaptive parallel prefix computation

On-line adaptive parallel prefix computation. Jean-Louis Roch, Daouda Traoré and Julien Bernard Presented by Andreas Söderström, ITN. The prefix problem. Given X = x 1 ,x 2 ,…,x n compute the n products π k =x 0 о x 1 о … ο x k for 1 ≤ k ≤ n where ο is some associative operation

avent
Download Presentation

On-line adaptive parallel prefix computation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-line adaptive parallel prefix computation Jean-Louis Roch, Daouda Traoré and Julien Bernard Presented by Andreas Söderström, ITN

  2. The prefix problem • Given X = x1,x2,…,xn compute the n productsπk=x0 о x1 о … ο xk for 1 ≤ k ≤ nwhere ο is some associative operation • Example:o = + (i.e. addition)X = 1,3,5,7π1 = 1π2 = 1+3 = 4 π3 = 1+3+5 = 9 π4 = 1+3+5+7 = 16

  3. Parallel prefix sum (first pass) Step 3 36 Step 2 10 26 Step 1 3 7 11 15 Step 0 1 2 3 4 5 6 7 8

  4. Step 0 36 Step 1 10 26 Step 2 3 7 11 15 Step 3 1 2 3 4 5 6 7 8 Parallel prefix sum (second pass) • For every even position use the value of the parent node • For evey odd position pn compute pn-1+ pn 36 10 21 36 3 6 10 15 21 28 36

  5. Parallel prefix computation • Parallel time: 2n/p + O(log n) for p < n/(log n) • Lower bound for parallel time: 2n/(p+1) for n > p(p+1)/2 • Assumes identical processors!

  6. Parallel prefix computation • Potential practical problems: • Processor setup may be heterogenous • Processor load may vary due to other users computing on the same machine • Off-line optimal scheduling potentially not optimal anymore! • Solution: • Use on-line scheduling!

  7. The basic idea • Combine a sequentially optimal algorithm with fine-grained parallellism using work stealing P0 P1 P2 … Pn Steal work Steal work

  8. The algorithm Sequential process Ps: • The sequential process Ps starts working on [π1, πk], i.e. value indices [1,k] where indices [k+1,m] has been stolen • When Ps reaches the index k it communicates πk to the parallel process Pv that has stolen [k+1,m] and recoveres the last index n computed by Pv together with the local prefix result rn • Ps uses associativity to calculate πn+1 = πko rnand continues with the computation from index n+1

  9. The algorithm Parallel process Pv • Pv scans for active processes (can be Ps or another Pv) and steals part of the work from that process. • Pv computes the local prefix operation on the stolen interval • The computation of Pv depends on a previous value and need to be finalized when that value is known

  10. Jump 4 5 6 7 8 9 10 11 12 Stealable Result Finalize The algorithm P0 1 2 3 13 14 15 16 P1 P2

  11. Performance • If a processor is or becomes slow part of its work can be stolen by an idle processor • Asymptotic optimality (proof provided in the paper)

  12. Performance P homogenous processeors

  13. Performance P heterogenous processors

  14. Questions?

More Related