1 / 19

CS 584

CS 584. Algorithm Analysis Assumptions. Consider ring, mesh, and hypercube. Each process can either send or receive a single message at a time. No special communication hardware. When discussing a mesh architecture we will consider a square toroidal mesh.

vince
Download Presentation

CS 584

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 584

  2. Algorithm Analysis Assumptions • Consider ring, mesh, and hypercube. • Each process can either send or receive a single message at a time. • No special communication hardware. • When discussing a mesh architecture we will consider a square toroidal mesh. • Latency is ts and Bandwidth is tw

  3. Basic Algorithms • Broadcast Algorithms • one to all (scatter) • all to one (gather) • all to all • Reduction • all to one • all to all

  4. Broadcast (ring) • Distribute a message of size m to all nodes. source

  5. Broadcast (ring) • Distribute a message of size m to all nodes. • Start the message both ways 4 3 2 source 1 4 3 2 T = (ts + twm)(p/2)

  6. Broadcast (mesh)

  7. Broadcast (mesh) Broadcast to source row using ring algorithm

  8. Broadcast (mesh) Broadcast to source row using ring algorithm Broadcast to the rest using ring algorithm from the source row

  9. Broadcast (mesh) Broadcast to source row using ring algorithm Broadcast to the rest using ring algorithm from the source row T = 2(ts + twm)(p1/2/2)

  10. Broadcast (hypercube)

  11. Broadcast (hypercube) 3 3 2 3 2 1 3 A message is sent along each dimension of the hypercube. Parallelism grows as a binary tree.

  12. Broadcast (hypercube) 3 3 T = (ts + twm)log2 p 2 3 2 1 3 A message is sent along each dimension of the hypercube. Parallelism grows as a binary tree.

  13. Broadcast • Mesh algorithm was based on embedding rings in the mesh. • Can we do better on the mesh? • Can we embed a tree in a mesh? • Exercise for the reader. (-: hint, hint ;-)

  14. Other Broadcasts • Many algorithms for all-to-one and all-to-all communication are simply reversals and duals of the one-to-all broadcast. • Examples • All-to-one • Reverse the algorithm and concatenate • All-to-all • Butterfly and concatenate

  15. Reduction Algorithms • Reduce or combine a set of values on each processor to a single set. • Summation • Max/Min • Many reduction algorithms simply use the all-to-one broadcast algorithm. • Operation is performed at each node.

  16. Reduction • If the goal is to have only one processor with the answer, use broadcast algorithms. • If all must know, use butterfly. • Reduces algorithm from 2log p to log p

  17. 111 110 6 7 010 2 011 3 4 100 5 101 0 1 000 001 How'd they do that? • Broadcast and Reduction algorithms are based on Gray code numbering of nodes. • Consider a hypercube. Neighboring nodes differ by only one bit location.

  18. How'd they do that? • Start with most significant bit. • Flip the bit and send to that processor • Proceed with the next most significant bit • Continue until all bits have been used.

  19. Procedure SingleNodeAccum(d, my_id, m, X, sum) for j = 0 to m-1 sum[j] = X[j]; mask = 0 for i = 0 to d-1 if ((my_id AND mask) == 0) if ((my_id AND 2i) <> 0 msg_dest = my_id XOR 2i send(sum, msg_dest) else msg_src = my_id XOR 2i recv(sum, msg_src) for j = 0 to m-1 sum[j] += X[j] endif endif mask = mask XOR 2i endfor end

More Related