1 / 46

Basic Communication Operations

Basic Communication Operations. Carl Tropper Department of Computer Science. Building Blocks in Parallel Programs. Common interactions between processes in parallel programs-broadcast, reduction, prefix sum…. Study their behavior on standard networks-mesh, hypercube, linear arrays

Ava
Download Presentation

Basic Communication Operations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Communication Operations Carl Tropper Department of Computer Science

  2. Building Blocks in Parallel Programs • Common interactions between processes in parallel programs-broadcast, reduction, prefix sum…. • Study their behavior on standard networks-mesh, hypercube, linear arrays • Derive expressions for time complexity • Assumptions are: send/receive one message on a link at a time, can send on one link while receiving on another link, cut through routing, bidirectional links • Message transfer time on a link: ts + twm

  3. Broadcast and Reduction • One to all broadcast • All to one reduction • p processes have m words apiece • associative operation on each word-sum,product,max,min • Both used in • Gaussian elimination • Shortest paths • Inner product of vectors

  4. One to all Broadcast on Ring or Linear Array • One to all broadcast • Naïve-send p-1 messages from source to other processes. Inefficient. • Recursive doubling (below). Furthest node,half the distance…. Log p steps

  5. Reduction on Linear Array • Odd nodes send to preceding even nodes • 0,2 to 0| 6,4 to 4 concurrently • 4 to 0

  6. Matrix vector multiplication on an nxn mesh • Broadcast input vector to each row • Each row does reduction

  7. Broadcast on a Mesh • Build on array algorithm • One to all from source (0) to row (4,8,12) • One to all on the columns

  8. Broadcast on a Hypercube • Source sends to nodes with highest dimension(msb), then next lower dimension,destination and source to next lower dimension,…..,everyone who can sends to lowest dimension

  9. Broadcast on a Tree • Same idea as hypercube • Grey circles are switches • Start from 0 and go to 4, rest the same as hypercube algorithm

  10. All to all broadcast on linear array/ring • Idea-each node is kept busy in a circular transfer

  11. All to all broadcast on ring

  12. All to all broadcast • Mesh-each row does all to all. Nodes do all to all broadcast in their columns. • Hypercube-pairs of nodes exchange messages in each dimension. Requires log p steps.

  13. All to all broadcast on a mesh

  14. All to all broadcast on a hypercube • Hypercube of dimension p is composed of two hypercubes of dimension p-1 • Start with dimension one hypercubes. Adjacent nodes exchange messages • Adjacent nodes in dimension two hypercubes exchange messages • In general, adjacent nodes in next higher dimensional hypercubes exchange messages • Requires log p steps.

  15. All to all broadcast on a hypercube

  16. All to all broadcast time • Ring or Linear Array T=(ts+twm)(p-1) • Mesh • 1st phase: √p simultaneous all to all broadcasts among √p nodes takes time (ts+ twm)(√p-1) • 2nd phase: size of each message is m√p, phase takes time (ts+twm√p) )(√p-1) • T=2ts(√p-1)+twm(p-1) is total time • Hypercube • Size of message in ith step is 2i-1m • Takes ts+2i-1twm for a pair of nodes to send and receive messages • T=∑I=1logp (ts+2i-1twm)=tslog p + twm(p-1)

  17. Observation on all to all broadcast time • Send time neglecting start up is the same for all 3 architectures:twm(p-1) • Lower bound on communication time for all 3 architectures

  18. All Reduce Operation • All reduce operation-all nodes start with buffer of size m. Associative operation performed on all buffers-all nodes get same result. • Semantically equivalent to all to one reduction followed by one to all broadcast. • All reduce with one word implements barrier synch for message passing machines. No node can finish reduction before all nodes have contributed to the reduction. • Implement all reduce using all to all broadcast. Add message contents instead of concatenating messages. • In hypercube, T=(ts+twm)log p for log p steps because message size does not double in each dimension.

  19. Prefix Sum Operation • Prefix sums (scans) are all partial sums, sk, of p numbers, n1,….,np-1, one number living on each node. • Node k starts out with nk and winds up with sk • Modify all to all broadcast. Each node only uses partial sums from nodes with smaller labels.

  20. . Prefix Sum on Hypercube

  21. Prefix sum on hypercube

  22. Scatter and gather • Scatter (one to all personalized communication) source sends unique message to each destination (vs broadcast - same message for each destination) • Hypercube:use one to all broadcast. Node transfers half of its messages to one neighbor and half to other neighbor at each step.Data goes from one subcube to another. Log p steps. • Gather: Node collects unique messages from other nodes • Hypercube:Odd numbered nodes send buffers to even nodes in other (lower dimensional) cube. Continue….

  23. Scatter operation on hypercube

  24. All to all personalized communication • Each node sends distinct messages to every other node. • Used in FFT, Matrix transpose, sample sort • Transpose of A[i,j] is AT=A[j,i] ,0≤ i,j ≤ n • Put row i {(i,o),(i,1),….,(i,n-1) } processor Pi • Transpose-(i,0) goes to P0, (i,1) goes to processor 1,(i,n) goes to Pn • Every processor sends a distinct element to every other processor! • All to all on ring,mesh,hypercube

  25. All to all on ring,mesh,hypercube • Ring: All nodes send messages in same direction. • Each node sends p-1 message of size m to neighbor. • Nodes extract message(s) for them, forward rest • Mesh: p x p mesh. Each node groups destination nodes into columns • All to all personalized communication on each row • Upon arrival, messages are sorted by row • All to all communication on rows • Hypercube p node hypercube • p/2 links in same dimension connect 2 subcubes with p/2 nodes • Each node exchanges p/2 messages in a given dimension

  26. All to all personalized communication -optimal algorithm on a hypercube • Nodes chose partners for exchange in order to not suffer congestion • In step j node i exchanges with node i XOR j • First step-all nodes differing in lsb exchange messages………….. • Last step-all nodes differing msb exchange messages • E cube routing in hypercube implements this • (Hamming) distance between nodes is # non-zero bits in i xor j • Sort links corresponding to non-zero bits in ascending order • Route messages along these links

  27. Optimal algorithm picture

  28. Optimal algorithm • T=(ts+twm)(p-1)

  29. Circular q Shift • Node i sends packet to (i+q) mod p • Useful in string, image pattern matching, matrix comp… • 5 shift on 4x4 mesh • Everyone goes 1 to the right (q mod √p) • Everyone goes 1 upwards q/√p • Left column goes up before general upshift to make up for wraparound effect in first step

  30. Mesh Circular Shift

  31. Circular shift on hypercube • Map linear array onto hypercube: map I to j, where j is the d bit reflected Gray code of i

  32. What happens if we split messages into packets? • One to all broadcast:scatter operation followed by all to all broadcast • Scatter: tslogp p + tw(m/p)(p-1) • All to all of messages of size m/p: ts log p +tw(m/p)(p-1) on a hypercube • T~2 x (ts log p + twm)on a hypercube • Bottom line-double the start-up cost, but cost of tw reduced by (log p)/2 • All to one reduction-dual of one to all broadcast, so all to all reduction followed by gather • All reduce combine the above two

More Related