1 / 21

CSCI-455/552

CSCI-455/552. Introduction to High Performance Computing Lecture 11. Bucket Sort. One “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm. Sequential sorting time complexity: O(nlog(n/m).

Download Presentation

CSCI-455/552

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCI-455/552 Introduction to High Performance Computing Lecture 11

  2. Bucket Sort One “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm. Sequential sorting time complexity: O(nlog(n/m). Works well if the original numbers uniformly distributed across a known interval, say 0 to a - 1. 4.9

  3. Parallel Version of Bucket Sort Simple approach Assign one processor for each bucket. 4.10

  4. Further Parallelization Partition sequence into m regions, one region for each processor. Each processor maintains p “small” buckets and separates numbers in its region into its own small buckets. Small buckets then emptied into p final buckets for sorting, which requires each processor to send one small bucket to each of the other processors (bucket i to processor i). 4.11

  5. Another Parallel Version of Bucket Sort Introduces new message-passing operation-all-to-all broadcast. 4.12

  6. “all-to-all” Broadcast Routine Sends data from each process to every other process 4.13

  7. “all-to-all” routine actually transfers rows of an array to columns: Transposes a matrix. 4.14

  8. Parallel Bucket and Sample Sort • The critical aspect of the above algorithm is one of assigning ranges to processors. This is done by suitable splitter selection. • The splitter selection method divides the n elements into p blocks of size n/p each, and sorts each block by using quicksort. • From each sorted block it chooses p – 1 evenly spaced elements. • The p(p – 1) elements selected from all the blocks represent the sample used to determine the buckets. • This scheme guarantees that the number of elements ending up in each bucket is uniformed (less than 2n/p).

  9. Parallel Bucket and Sample Sort An example of the execution of sample sort on an array with 24 elements on three processes.

  10. Parallel Bucket and Sample Sort • The splitter selection scheme can itself be parallelized. • Each processor generates the p – 1 local splitters in parallel. • All processors share their splitters using a single all-to-all broadcast operation. • Each processor sorts the p(p – 1) elements it receives and selects p – 1 uniformly spaces splitters from them.

  11. Parallel Complexity Analysis 4.11

  12. Numerical Integration Using Rectangles Each region calculated using an approximation given by rectangles: Aligning the rectangles: 4.15

  13. Numerical Integration Using Trapezoidal Method May not be better! 4.16

  14. Adaptive Quadrature Solution adapts to shape of curve. Use three areas, A, B, and C. Computation terminated when largest of A and B sufficiently close to sum of remain two areas . 4.17

  15. Adaptive Quadrature with False Termination. Some care might be needed in choosing when to terminate. Might cause us to terminate early, as two large regions are the same (i.e., C = 0). 4.18

More Related