1 / 10

Computer Science 320

Computer Science 320. Broadcasting. Floyd ’ s Algorithm on SMP. for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc , d ri + d ic ). Floyd ’ s Algorithm on Cluster.

sissy
Download Presentation

Computer Science 320

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Science 320 Broadcasting

  2. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)

  3. Floyd’s Algorithm on Cluster • Root node reads distance matrix from input file and scatters row slices to other nodes • Other nodes compute distances and update their slices • The slices are gathered back to the root node for output

  4. Parallel I/O File Pattern • Eliminate the gather of data by having each node write its slice to a separate file • Eliminate the scatter of data by having each node read its slice from the input file

  5. Execution Timeline

  6. Sharing Data in Computation • On each pass through the outer loop, the ith row must be available to all of the processes (they all execute the same line of code in the inner loop) • They can do this in SMP because they share the entire matrix • They can’t do this in a cluster setup, because they don’t share for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)

  7. Share Row via a Broadcast Message • The process that owns a row broadcasts it before the parallel loop is run, on each pass through the outer loop • Process that owns the row acts as the root for the broadcast, setting up the source buffer • The other processes set up a destination buffer • Broadcast also enforces synchronization; they all wait for the broadcast for i = 0 to n – 1 broadcast row i of d parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)

  8. // Allocate storage for row broadcast from another process. row_i= new double [n]; row_i_buf= DoubleBuf.buffer (row_i); inti_root = 0; for (inti = 0; i < n; ++ i){ double[] d_i = d[i]; // Determine which process owns row i. if (! ranges[i_root].contains(i)) ++ i_root; // Broadcast row i from owner process to all processes. if (rank == i_root) world.broadcast(i_root, DoubleBuf.buffer (d_i)); else{ world.broadcast(i_root, row_i_buf); d_i= row_i; } // Inner loops over rows in my slice and over all columns. for (int r = mylb; r <= myub; ++ r){ double[] d_r = d[r]; for (int c = 0; c < n; ++ c) d_r[c] = Math.min (d_r[c], d_r[i] + d_i[c]); } }

  9. Problem: Too Many Messages • The amount of time spent in communication is too high when compared to the time spent in computation

More Related