130 likes | 146 Views
Explore the efficient parallelization of Quicksort using hypercube algorithm to enhance scalability and performance. Learn about pivot selection strategies and domain decomposition for optimal sorting results.
E N D
Quicksort • Simple, low overhead • O(n log n) • Divide and conquer • Divide recursively into smaller subsequences.
Quicksort • n elements stored in A[1…n] • Divide • Divide a sequence into two parts • A[q…r] becomes A[q…s] and A[s+1…r] • make all elements of A[q…s] smaller than or equal to all elements of A[s+1…r] • Conquer • Recursively apply Quicksort
Quicksort • Partition the sequence A[q…r] by picking a pivot. • Performance is greatly affected by the choice of the pivot. • If we pick a bad pivot, we end up with a O(n2) algorithm.
Parallelizing Quicksort • Task parallelism • At each step of the algorithm 2 recursive calls are made. • Farm out one of the recursive calls to another processor. • Problems • The work of partitioning is done by one processor.
Parallelizing Quicksort • Consider domain decomposition. • Hypercube • a d dimensional hypercube can be split into two (d-1) dimensional hypercubes such that each processor in one cube is connected to one in the other cube. • If all processors know the pivot, neighbors split their respective lists and all elements larger than the pivot are on one cube and smaller elements are on the other cube
Parallelizing Quicksort • After we go through each dimension, if n>p the numbers are not totally sorted. • Why? • Each processor then sorts their own sublist using a sequential quicksort. • Pivot selection is particularly important • Bad pivots eliminate some processors
Pivot Selection • Random selection • During the ith split one of the processors in each subcube picks a random element from its list and broadcasts to others. • Problem • What if a bad pivot is selected at first?
Pivot Selection • Median selection • If the distribution is uniform then each processor's list is a representative sample thus the median is representative • Problem • Is the distribution really uniform? • Can we assume that a single processor's list has the same distribution as the full list?
Procedure HypercubeQuickSort(B) sort B using sequential quicksort for I = 1 to d Select pivot and broadcast or receive pivot partition B into B1 and B2 such that B1<= pivot < B2 if ith bit of iproc is zero then send B2 to neighbor along ith dimension C = subsequence received along ith dimension Merge B1 and C into B else send B2 to neighbor along C = subsequence received along ith dimension Merge B2 and C into B endif endfor
Analysis • Iterations = log2p • Select a pivot = O(n) • keep sublist sorted • Broadcast pivot = O(log2p) • Split the sequence • split own sequence = O(log n/p) • exchange blocks with neighbor = O(n/p) • merge blocks = O(n/p)
Analysis • Quicksort appears very scalable • Depends heavily on the pivot • Easy to parallelize • Hypercube sorting algorithms depend on the ability to map a hypercube onto the node communication architecture.