1 / 9

Some tricks for XMT Programming with Reductions and Linear Recurrences

Some tricks for XMT Programming with Reductions and Linear Recurrences. Jonathan Berry Scalable Algorithms Department Sandia National Laboratories July 24, 2008. Recall the PageRank and Community Detection Discussions. We saw that the way loops are parallelized is crucial

Download Presentation

Some tricks for XMT Programming with Reductions and Linear Recurrences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some tricks for XMT Programming with Reductions and Linear Recurrences Jonathan Berry Scalable Algorithms Department Sandia National Laboratories July 24, 2008

  2. Recall the PageRank and Community Detection Discussions • We saw that the way loops are parallelized is crucial • We scaled once the compiler merged the loops in our rank accumulation method and removed a reduction from the resulting single loop • The strong scaling stopped if this wasn’t accomplished • The kernel of our facility location-based community detection approaches also requires the removal of reductions and the processing of linear recurrences

  3. Reductions: Code Carefully • Consider adding up absolute values of integers: Attempt 1: int total=0; for (int i=0; i<n; i++) { if (v[i] < 0) { total += -v[i]; } else { total += v[i]; } } The compiler has trouble dealing with this branched loop body; the reduction isn’t removed.

  4. Reductions: Code Carefully • Consider adding up absolute values of integers: Attempt 2: int total=0; for (int i=0; i<n; i++) { int incr = (v[i] < 0) * -v[i] + (v[i] >= 0) * v[i]; total += incr; } This loop body has no branches and the reduction is removed correctly

  5. Reductions: Code Carefully • Consider a conditional reduction: Attempt 1: int max=0; for (int i=0; i<n; i++) { if (mask[i] && v[i] > max) { max = v[i]; } } The complex conditional expression with short-circuit evaluation can turn off reduction removal (and has in my experience)

  6. Reductions: Code Carefully • Consider a conditional reduction: Attempt 2: int max=0; for (int i=0; i<n; i++) { int candidate = mask[i] * v[i]; if (candidate > max) { max = candidate; } } This works! The reduction is removed from the loop.

  7. Linear Recurrences • The compiler will generate efficient code to parallelize linear recurrences, but you must keep the structure simple • Suppose that you want to condition the additive term upon some test Attempt 1: int max=0; for (int i=0; i<n; i++) { if (v[i] < 0) { f[i] = f[i-1] + -v[i]; } else { f[i] = f[i-1] + v[i]; } } This works! The reduction is removed from the loop.

  8. Linear Recurrences • Suppose that you want to condition the additive term upon some test Attempt 2: int max=0; for (int i=0; i<n; i++) { int incr = (v[i]<0)*-v[i] + (v[i]>=0)*v[i]; f[i] = f[i-1] + incr; } This works! The linear recurrence is parallelized We saw some nastier examples in the discussion, but they reduce to the same rule: compute the increment, match the simple template.

  9. Acknowledgements Thanks to John Feo (Microsoft Research, formerly Cray) for suggesting the precomputation trick in the case of conditional reduction with compound boolean expressions.

More Related