1 / 46

This class

CS 575 Design and Analysis of Computer Algorithms Professor Michal Cutler Lecture # 7 September 20, 2005. This class. Quicksort Lower bound on number of comparisons done by comparison based sorts Linear sorts Pigeonhole sort O(n +k) Counting sort O(n + k)

bmarshall
Download Presentation

This class

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 575Design and Analysis ofComputer AlgorithmsProfessor Michal CutlerLecture # 7September 20, 2005

  2. This class • Quicksort • Lower bound on number of comparisons done by comparison based sorts • Linear sorts • Pigeonhole sort O(n +k) • Counting sort O(n + k) • Bucket sort O(n) • Radix sort O(b/r(n+2r))

  3. Quick Sort • Divide and conquer idea: Divide problem into two smaller sorting problems. • Divide and conquer: • Select a splitting element (pivot) • PARTITION the array/list • No need to combine results

  4. Quick Sort • PARTITION result: • All elements to the left of pivot are smaller than or equal to pivot, and • All elements to the right of pivot are greater or equal to pivot • pivot in correctplace in sorted array/list • Clever split procedure (Hoare)

  5. QUICKSORT(A, p, r) if p< r q <- PARTITION(A, p, r) QUICKSORT(A, p, q-1) QUICKSORT(A, q+1, r)

  6. Partition(A, p, r) //”left” partition contains A[k] <= x for k= p to i, //”right” partition contains all A[k] > x for k= i+1 to r-1 1. x <- A[r] //pivot x is last element 2. i <- p – 1 //“left” partition is empty 3. for j <- p to r-1 4. if A[j] <= x //store in “left” partition 5. i <- i + 1 6. exchange A[i] <-> A[j] 7. exchange A[i+1] <-> A[r] //store pivot 8. return i+1

  7. Partition Example p,i,j r p i j 2 8 7 1 3 5 6 4 2 1 3 8 7 5 6 4 p,i,j p i j 2 1 3 8 7 5 6 4 2 8 7 1 3 5 6 4 p,i,j p i 2 8 7 1 3 5 6 4 2 1 3 8 7 5 6 4 p,i,j p i r 2 8 7 1 3 5 6 4 2 1 3 4 7 5 6 8 p i j 2 1 7 8 3 5 6 4

  8. Performance of quicksort • Worst case partitioning? Asymptotic growth rate? • Best case partitioning? Asymptotic growth rate? • Balanced partitioning • Assume the partitioning element always produces 8 to 1 split • We will see that quicksort is O(n lg n) • In fact a 99 to 1 split would also be O(n lg n)

  9. n/8 n/8 n/8 n/8 ..> ..> Recursion Tree for Best Case Number of partition comparisons Total per depth n cn cn n/2 n/2 n/4 n/4 n/4 n/4 cn n/8 n/8 cn n/8 n/8 ..> ..> Sum = O(nlgn) T(n)<=2T(n/2)+cn T(n)=O(nlgn)

  10. Balanced partitioning • Assume after each application of PARTITION : • n/9 elements are to the left of the pivot and • 8n/9 elements are to the right of the pivot. • The longest path of calls to Quicksort is proportional to lgn and not n • The longest path of calls = 1 + log 9/8n = 1 + lgn / lg (9/8)  1 + 6lgn • Let n = 1,000,000. The longest path has about118 calls to Quicksort. • Note: shortest path has 1+ log9n = 1 +7 = 8 calls

  11. 512n/729 ..> ..> ..> T(n)<=T(n/9)+T(8n/9)+cn Total per depth n cn n/9 cn 8n/9 cn (log9n) n/81 8n/81 8n/81 64n/81 (log9/8n) cn n/729 8n/729 ... 0/1 <=cn 0/1 <=cn 0/1 <=cn 0/1

  12. n 1 n-1 Intuition for the Average case n Vs 1+(n-1)/2 (n-1)/2 (n-1)/2 (n-1)/2 One bad and one best Best Bad split “absorbed” by good split. After 2 calls to PARTITION array split “evenly” Expect average run time is O(n lg n)

  13. A randomized version of quicksort RANDOMIZED-PARTITION(A, p, r) i  RANDOM(p, r) exchange A[r]  A[i] return PARTITION (A, p, r)

  14. RANDOMIZED-QUICKSORT(A, p,r) • if p < r • q RANDOMIZED-PARTITION (A, p, r) • RANDOMIZED-QUICKSORT(A, p, q-1) • RANDOMIZED-QUICKSORT(A, q+1, r)

  15. Sort • Sort algorithms (heapsort, mergesort) run in Q(n lg n) in the worst case. • CAN WE DO BETTER? • No: If the sort is a comparison-sort (based only on comparison of keys) • Yes: If we allow arithmetic operations or take advantage of additional restrictions on the keys

  16. Sort We will show: 1) How to represent the execution of each comparison-sort algorithm with a decisiontree in which we: • Model only the comparisons • Ignore all other aspects of the algorithm 2)Explain why a decision tree for a correct sort algorithm has at least n! leaves 3) Show that depth of decision tree is (nlgn) • Analysis assumes all keys are distinct

  17. Decision Trees • A decision tree is a binary tree that shows the execution of a comparison based algorithm on all possible inputs of a given size. • Each internal node contains the pair of elements which are compared (<, or <=) • Each leaf contains an output. • Each branch is labeled by the result of the comparison (<= or >, yes or no)

  18. Input: a, b, c if a < b if b < c a,b,celse if a < c a,c,belse c,a,b else if b < c if a < c b,a,celse b,c,a else c,b,a Decision tree for sortThree a<b yes no b<c b<c yes no yes no a,b,c a<c c,b,a a<c yes yes no no c,a,b b,a,c b,c,a a,c,b 3!=6 leaves representing 6 permutations of 3 distinct numbers. 2 paths with 2 comparisons 2 paths with 3 comparisons Total 5 comparisons

  19. 1. for (i = 1; i  n -1; i++)2. for (j = i + 1; j  n ; j++)3. if ( S[ j ] < S[ i ])4. swap(S[ i ] ,S[ j ]) Exchange Sort At end of i = 1: S[1] = minS[ i ] At end of i = 2: S[2] = minS[ i ] At end of i = 3: S[3] = minS[ i ] At end of i = n-1: S[n-1] = minS[ i ] 1 i  n 2 i  n 3 i  n n- 1 i  n

  20. a,b,c Decision Tree for Exchange Sort for n=3 Example =(7,3,5) a,b,c s[2]<s[1] i=1 3 7 5 b,a,c a,b,c ab s[3]<s[1] s[3]<s[1] 3 7 5 b,a,c c,b,a a,b,c c,a,b cb ca s[3]<s[2] s[3]<s[2] s[3]<s[2] s[3]<s[2] cb ca ab ab b,c,a c,a,b a,c,b c,a,b c,b,a c,b,a b,a,c 3 5 7 For clarity we show the swaps and the current state of the list Every path has 3 comparisonsTotal 7 comparisons8 leaves. (c,b,a) and (c,a,b) appear twice.

  21. A decision tree for sort has depth (n lg n ). Assume depth of tree is d (i.e. there are d comparisons on the longest path from the root to a leaf ). • A binary tree of depth d can have at most l  2dleaves. • A decision tree for a correct algorithm must have at least l  n! leaves (outputs) • Thus, n!  2d • Taking lg of both sides we get d lg (n!). • It can be shown that lg (n !) = (n lg n ).

  22. 2 2 1 4 3 2 1 1 2 3 4 2 3 1 1 Pigeonhole sort • Problem: sort n keys in ranges 1 to k • Main idea:1) Count the number of keys of value i, maintain count in an auxiliary array, C2) Use counts to overwrite input • After step 1) • After step 2) • Analysis? Input A Aux C Output A 1 2 2 2 3 1 4

  23. Pigeonhole-Sort( A, k) fori  1 to k //initialize C C[i ]  0 forj  1 tolength[A] C[A[ j ] ]  C[A[ j ] ] + 1 //Count keys q <-1 forj  1to k //rewrite A while C[j]>0 A[q] = j C[ j ]  C[ j ]-1 q <- q+1 Pigeonhole Sort

  24. Counting sort • Problem: Sort n records stored in A[1..n] • Each record contains a key and data • All keys are in the range of 1 to k

  25. Counting sort • Main idea: • Count in C, number records with key = i, i = 1,…, k. • Use counts in C to compute the offset in sorted B of record with key i for i = 1,…, k. • Copy A into sorted B using and updating (decrementing) the computed offsets. Tomake the sort stable we start at lastposition of A.

  26. Counting sort • Additional Space • The sorted list is stored in B • Additional array C of size k • Note: Pigeonhole sort does not require array B

  27. How shall we compute the offsets? • Assume C[1]= 3 (then 3 records with key=1 should be stored in positions 1, 2, 3 in the sorted array B). We keep the offset for key 1 = 3. • Let C[2]=2 (then 2 records with key=2 should be in stored in positions 4, 5 in B). • We compute the offset for key 2 to be (C[2] + offset for key 1) = 2 +3 = 5 • In general offset for key i is (C[i] + offset for key i-1).

  28. Counting-Sort( A, B, k) fori  1 to k //initialize C C[i ]  0 forj  1 tolength[A] C[A[ j ] ]  C[A[ j ] ] + 1 //Count keys fori  2 tok C[i ]  C[i ] +C[i -1] //Compute offset forj  length[A] downto 1 //copy B [ C[A[ j ] ] ]  A[ j ] C[A[ j ] ] ]  C [A[ j ] ] –1//update offset Counting Sort

  29. B Counting sort A C C C 3 Clinton 4 Smith 1 Xu 2 Adams 3 Dunn 4 Yi 2 Baum 1 Fu 3 Gold 1 Lu 1 Land 1 Lu 1 Land 3 Gold 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 0 0 0 0 4 2 3 2 (4)(3)2 6 (9)8 11 1 2 3 4 1 2 3 4 1 2 3 4 finalcounts "offsets" Original list Sorted list

  30. Analysis: • O(k + n) time • What if k = O(n) • Requires k + n extra storage. • Stable sort: Preserves the original order of equal keys. • Is counting sort stable? • Is counting sort good for sorting records with 32 bit keys?

  31. Radix sort • Radix sort is used in card sorting machines

  32. Hollerith’s punched cards • Hollerith devised what was to become the computer punched card • Each card has 12 rows and 80 columns • Each column represents a single alphanumeric character or symbol. • The card punching machine punches holes in some of the 12 positions of each column

  33. A punched card

  34. IBM card punching machine Card punching machine

  35. Hollerith’s tabulating machines • As the cards were fed through a "tabulating machine," pins passed through the positions where holes were punched completing an electrical circuit and subsequently registered a value. • The 1880 census in the U.S. took seven years to complete • With Hollerith's "tabulating machines" the 1890 census took the Census Bureau six weeks • Through mergers company’s name - IBM

  36. Card sorting machine IBM’s card sorting machine

  37. Radixsort • Main idea • Break key into “digit” representation key = id, id-1, …, i2, i1 • "digit" can be a number in any base, a character, etc • Radix sort: for i= 1 to d sort “digit” i using a stable sort • Analysis : (d  (stable sort time)) where d is the number of “digit”s

  38. Radix sort • Which stable sort? • Since the range of values of a digit is small the best stable sort to use is Counting Sort. • When counting sort is used the time complexity is (d  (n +k )) where k is the range of a "digit". • When k  O(n), (d  n)

  39. Radix sort- 910 321 572 294 326 178 368 139 139 178 294 321 326 368 572 910 1 2 3 4 5 6 7 8 178 139 326 572 294 321 910 368 910 321 326 139 368 572 178 294    Sorted list Input list

  40. Lemma 8.4 • Given nb-bit numbers and any positive integer r<=b, radix sort correctly sorts these numbers in ((b/r)(n + 2r)) • Proof • Divide the number into b/r “digits”. • Each “digit” has r bits and a range 0 to 2r-1. • Radix sort executes b/r counting sorts. • Each counting sort is (n + 2r) • So the total is ((b/r)(n + 2r))

  41. Bucket sort • Assumption: Keys are distributed uniformly in interval [0, 1) • Main idea • n records are distributed into nbuckets (O(n)) • insert A[i] into list of B[nA[i]] • Buckets are sorted with insertion sort • Buckets are combined (O(n))

  42. BUCKET-SORT(A) • n <- length[A] • for i<-1 to n • insert A[i] into list of B[nA[i]] • for i=0 to n – 1 • sort list B[i] with insertion sort • Concatenate the lists B[0], B[1], …, B[n-1] together in order

  43. Bucket sort - example B B A .78 .17 .39 .26 .72 .94 .21 .12 .23 .68 1 2 3 4 5 6 7 8 9 10 / / / / / / / / 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 .12 .12 .17/ .17/ .21 .23 .23 .21 .26/ .26/ .39/ .39/ .68/ .68/ .78/ .72 .78/ .72 .94/ .94/ Step 2 sort Step 1 distribute

  44. Analysis • Let nibe the number of elementsin B[i] • Let xij = I{A[j] falls in bucket i}

  45. Analysis – E(ni2)= 2 – 1/n

  46. Analysis – E(ni2)= 2 – 1/n • What is the worst case run time of bucket sort?

More Related