1 / 31

A Taste of Parallel Algorithms

A Taste of Parallel Algorithms. We examine five simple building-block parallel operations and look at the corresponding algorithms on four simple parallel architectures: linear array, binary tree, 2D mesh, and a simple shared­variable computer. Semigroup Computation. Parallel Prefix Computation.

moodie
Download Presentation

A Taste of Parallel Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Taste of Parallel Algorithms Part I

  2. We examine five simple building-block parallel operations and look at the corresponding algorithms on four simple parallel architectures: linear array, binary tree, 2D mesh, and a simple shared­variable computer. Part I

  3. Semigroup Computation Part I

  4. Parallel Prefix Computation Part I

  5. Packet Routing • A packet of information resides at Processor i and must be sent to Processor j. The problem is to route the packet through intermediate processors, if needed, such that it gets to the destination as quickly as possible. • The problem becomes more challenging when multiple packets reside at different processors, each with its own destination. • When each processor has at most one packet to send and one packet to receive, the packet routing problem is called one-to-one communication or 1-1 routing. Part I

  6. Broadcasting • Given a value a known at a certain processor i, disseminate it to all p processors as quickly as possible, so that at the end, every processor has access to, or "knows," the value. This is sometimes referred to as one-to-all communication. • one-to-many communication, is known as multicasting. Part I

  7. Sorting • Rather than sorting a set of records, each with a key and data elements, we focus on sorting a set of keys for simplicity. Part I

  8. Linear Array • D=p-1 • d=2 • Ring? Part I

  9. Binary Tree • If all leaf levels are identical and every nonleaf processor has two children, the binary tree is said to be complete. • D= • d=3 Part I

  10. 2D Mesh • D= • d=4 • Torus? Part I

  11. Shared memory • A shared-memory multiprocessor can be modeled as a complete graph, in which every node is connected to every other node. • D=1 • d=p-1 Part I

  12. Algorithms for a Linear Array (1) • Semigroup Computation • Let us consider first a special case of semigroup computation, namely, that of maximum finding. Each of the p processors holds a value initially and our goal is for every processor to know the largest of these values. Part I

  13. Algorithms for a Linear Array (2) • Parallel Prefix Computation (Case1) Part I

  14. Algorithms for a Linear Array (3) • Parallel Prefix Computation (Case2, more than one value) Part I

  15. Algorithms for a Linear Array (4) • Packet Routing Part I

  16. Algorithms for a Linear Array (5) • Broadcasting • If Processor i wants to broadcast a value a to all processors, it sends an rbcast(a) (read r-broadcast) message to its right neighbor and an lbcast(a) message to its left neighbor. Part I

  17. Algorithms for a Linear Array (6) • Sorting (Case 1) Part I

  18. Algorithms for a Linear Array (7) • Sorting (Case 2, odd-even transposition) (efficiency?) Part I

  19. Algorithms for a Binary Tree (1) • In algorithms for a binary tree of processors, we will assume that the data elements are initially held by the leaf processors only. • The nonleaf (inner) processors participate in the computation, but do not hold data elements of their own. Part I

  20. Algorithms for a Binary Tree (2) • Semigroup Computation • Each inner node receives two values from its children, applies the operator to them, and passes the result upward to its parent. Part I

  21. Algorithms for a Binary Tree (3) • Parallel Prefix Computation Part I

  22. Algorithms for a Binary Tree (4) • Packet Routing • depends on the processor numbering scheme used. • Preorder Part I

  23. Algorithms for a Binary Tree (5) • Broadcasting • Processor i sends the desired data upwards to the root processor, which then broadcasts the data downwards to all processors. Part I

  24. Algorithms for a Binary Tree (6) • Sorting Part I

  25. Algorithms for 2D Mesh (1) • In all of the 2D mesh algorithms presented in this section, we use the linear-array algorithms of Section 2.3 as building blocks. • This leads to simple algorithms, but not necessarily the most efficient ones. Mesh-based architectures and their algorithms will be discussed in great detail in Part III. Part I

  26. Algorithms for 2D Mesh (2) • Semigroup Computation • For example, in finding the maximum of a set of p values, stored one per processor, the row maximums are computed first and made available to every processor in the row. Then column maximums are identified. Part I

  27. Algorithms for 2D Mesh (3) • Parallel Prefix Computation • (1) do a parallel prefix computation on each row, • (2) do a diminished parallel prefix computation in the rightmost column, and • (3) broadcast the results in the rightmost column to all of the elements in the respective rows and combine with the initially computed row prefix value. Part I

  28. Algorithms for 2D Mesh (4) • Packet Routing • To route a data packet from the processor in Row r, Column c, to the processor in Row r', Column c', we first route it within Row r to Column c'. Then, we route it in Column c' from Row r to Row r'. (row-first routing) Part I

  29. Algorithms for 2D Mesh (5) • Broadcasting • (1) broadcast the packet to every processor in the source node's row and • (2) broadcast in all columns. Part I

  30. Algorithms for 2D Mesh (6) • Sorting Part I

  31. Algorithms for Shared Variables • Semigroup Computation • Parallel Prefix computation • Packet Routing (Trivial in view of the direct communication path between any pair of processors) • Broadcasting (Trivial, as each processor can send a data item to all processors directly) • Sorting Part I

More Related