460 likes | 477 Views
CS 575 Design and Analysis of Computer Algorithms Professor Michal Cutler Lecture # 7 September 20, 2005. This class. Quicksort Lower bound on number of comparisons done by comparison based sorts Linear sorts Pigeonhole sort O(n +k) Counting sort O(n + k)
E N D
CS 575Design and Analysis ofComputer AlgorithmsProfessor Michal CutlerLecture # 7September 20, 2005
This class • Quicksort • Lower bound on number of comparisons done by comparison based sorts • Linear sorts • Pigeonhole sort O(n +k) • Counting sort O(n + k) • Bucket sort O(n) • Radix sort O(b/r(n+2r))
Quick Sort • Divide and conquer idea: Divide problem into two smaller sorting problems. • Divide and conquer: • Select a splitting element (pivot) • PARTITION the array/list • No need to combine results
Quick Sort • PARTITION result: • All elements to the left of pivot are smaller than or equal to pivot, and • All elements to the right of pivot are greater or equal to pivot • pivot in correctplace in sorted array/list • Clever split procedure (Hoare)
QUICKSORT(A, p, r) if p< r q <- PARTITION(A, p, r) QUICKSORT(A, p, q-1) QUICKSORT(A, q+1, r)
Partition(A, p, r) //”left” partition contains A[k] <= x for k= p to i, //”right” partition contains all A[k] > x for k= i+1 to r-1 1. x <- A[r] //pivot x is last element 2. i <- p – 1 //“left” partition is empty 3. for j <- p to r-1 4. if A[j] <= x //store in “left” partition 5. i <- i + 1 6. exchange A[i] <-> A[j] 7. exchange A[i+1] <-> A[r] //store pivot 8. return i+1
Partition Example p,i,j r p i j 2 8 7 1 3 5 6 4 2 1 3 8 7 5 6 4 p,i,j p i j 2 1 3 8 7 5 6 4 2 8 7 1 3 5 6 4 p,i,j p i 2 8 7 1 3 5 6 4 2 1 3 8 7 5 6 4 p,i,j p i r 2 8 7 1 3 5 6 4 2 1 3 4 7 5 6 8 p i j 2 1 7 8 3 5 6 4
Performance of quicksort • Worst case partitioning? Asymptotic growth rate? • Best case partitioning? Asymptotic growth rate? • Balanced partitioning • Assume the partitioning element always produces 8 to 1 split • We will see that quicksort is O(n lg n) • In fact a 99 to 1 split would also be O(n lg n)
n/8 n/8 n/8 n/8 ..> ..> Recursion Tree for Best Case Number of partition comparisons Total per depth n cn cn n/2 n/2 n/4 n/4 n/4 n/4 cn n/8 n/8 cn n/8 n/8 ..> ..> Sum = O(nlgn) T(n)<=2T(n/2)+cn T(n)=O(nlgn)
Balanced partitioning • Assume after each application of PARTITION : • n/9 elements are to the left of the pivot and • 8n/9 elements are to the right of the pivot. • The longest path of calls to Quicksort is proportional to lgn and not n • The longest path of calls = 1 + log 9/8n = 1 + lgn / lg (9/8) 1 + 6lgn • Let n = 1,000,000. The longest path has about118 calls to Quicksort. • Note: shortest path has 1+ log9n = 1 +7 = 8 calls
512n/729 ..> ..> ..> T(n)<=T(n/9)+T(8n/9)+cn Total per depth n cn n/9 cn 8n/9 cn (log9n) n/81 8n/81 8n/81 64n/81 (log9/8n) cn n/729 8n/729 ... 0/1 <=cn 0/1 <=cn 0/1 <=cn 0/1
n 1 n-1 Intuition for the Average case n Vs 1+(n-1)/2 (n-1)/2 (n-1)/2 (n-1)/2 One bad and one best Best Bad split “absorbed” by good split. After 2 calls to PARTITION array split “evenly” Expect average run time is O(n lg n)
A randomized version of quicksort RANDOMIZED-PARTITION(A, p, r) i RANDOM(p, r) exchange A[r] A[i] return PARTITION (A, p, r)
RANDOMIZED-QUICKSORT(A, p,r) • if p < r • q RANDOMIZED-PARTITION (A, p, r) • RANDOMIZED-QUICKSORT(A, p, q-1) • RANDOMIZED-QUICKSORT(A, q+1, r)
Sort • Sort algorithms (heapsort, mergesort) run in Q(n lg n) in the worst case. • CAN WE DO BETTER? • No: If the sort is a comparison-sort (based only on comparison of keys) • Yes: If we allow arithmetic operations or take advantage of additional restrictions on the keys
Sort We will show: 1) How to represent the execution of each comparison-sort algorithm with a decisiontree in which we: • Model only the comparisons • Ignore all other aspects of the algorithm 2)Explain why a decision tree for a correct sort algorithm has at least n! leaves 3) Show that depth of decision tree is (nlgn) • Analysis assumes all keys are distinct
Decision Trees • A decision tree is a binary tree that shows the execution of a comparison based algorithm on all possible inputs of a given size. • Each internal node contains the pair of elements which are compared (<, or <=) • Each leaf contains an output. • Each branch is labeled by the result of the comparison (<= or >, yes or no)
Input: a, b, c if a < b if b < c a,b,celse if a < c a,c,belse c,a,b else if b < c if a < c b,a,celse b,c,a else c,b,a Decision tree for sortThree a<b yes no b<c b<c yes no yes no a,b,c a<c c,b,a a<c yes yes no no c,a,b b,a,c b,c,a a,c,b 3!=6 leaves representing 6 permutations of 3 distinct numbers. 2 paths with 2 comparisons 2 paths with 3 comparisons Total 5 comparisons
1. for (i = 1; i n -1; i++)2. for (j = i + 1; j n ; j++)3. if ( S[ j ] < S[ i ])4. swap(S[ i ] ,S[ j ]) Exchange Sort At end of i = 1: S[1] = minS[ i ] At end of i = 2: S[2] = minS[ i ] At end of i = 3: S[3] = minS[ i ] At end of i = n-1: S[n-1] = minS[ i ] 1 i n 2 i n 3 i n n- 1 i n
a,b,c Decision Tree for Exchange Sort for n=3 Example =(7,3,5) a,b,c s[2]<s[1] i=1 3 7 5 b,a,c a,b,c ab s[3]<s[1] s[3]<s[1] 3 7 5 b,a,c c,b,a a,b,c c,a,b cb ca s[3]<s[2] s[3]<s[2] s[3]<s[2] s[3]<s[2] cb ca ab ab b,c,a c,a,b a,c,b c,a,b c,b,a c,b,a b,a,c 3 5 7 For clarity we show the swaps and the current state of the list Every path has 3 comparisonsTotal 7 comparisons8 leaves. (c,b,a) and (c,a,b) appear twice.
A decision tree for sort has depth (n lg n ). Assume depth of tree is d (i.e. there are d comparisons on the longest path from the root to a leaf ). • A binary tree of depth d can have at most l 2dleaves. • A decision tree for a correct algorithm must have at least l n! leaves (outputs) • Thus, n! 2d • Taking lg of both sides we get d lg (n!). • It can be shown that lg (n !) = (n lg n ).
2 2 1 4 3 2 1 1 2 3 4 2 3 1 1 Pigeonhole sort • Problem: sort n keys in ranges 1 to k • Main idea:1) Count the number of keys of value i, maintain count in an auxiliary array, C2) Use counts to overwrite input • After step 1) • After step 2) • Analysis? Input A Aux C Output A 1 2 2 2 3 1 4
Pigeonhole-Sort( A, k) fori 1 to k //initialize C C[i ] 0 forj 1 tolength[A] C[A[ j ] ] C[A[ j ] ] + 1 //Count keys q <-1 forj 1to k //rewrite A while C[j]>0 A[q] = j C[ j ] C[ j ]-1 q <- q+1 Pigeonhole Sort
Counting sort • Problem: Sort n records stored in A[1..n] • Each record contains a key and data • All keys are in the range of 1 to k
Counting sort • Main idea: • Count in C, number records with key = i, i = 1,…, k. • Use counts in C to compute the offset in sorted B of record with key i for i = 1,…, k. • Copy A into sorted B using and updating (decrementing) the computed offsets. Tomake the sort stable we start at lastposition of A.
Counting sort • Additional Space • The sorted list is stored in B • Additional array C of size k • Note: Pigeonhole sort does not require array B
How shall we compute the offsets? • Assume C[1]= 3 (then 3 records with key=1 should be stored in positions 1, 2, 3 in the sorted array B). We keep the offset for key 1 = 3. • Let C[2]=2 (then 2 records with key=2 should be in stored in positions 4, 5 in B). • We compute the offset for key 2 to be (C[2] + offset for key 1) = 2 +3 = 5 • In general offset for key i is (C[i] + offset for key i-1).
Counting-Sort( A, B, k) fori 1 to k //initialize C C[i ] 0 forj 1 tolength[A] C[A[ j ] ] C[A[ j ] ] + 1 //Count keys fori 2 tok C[i ] C[i ] +C[i -1] //Compute offset forj length[A] downto 1 //copy B [ C[A[ j ] ] ] A[ j ] C[A[ j ] ] ] C [A[ j ] ] –1//update offset Counting Sort
B Counting sort A C C C 3 Clinton 4 Smith 1 Xu 2 Adams 3 Dunn 4 Yi 2 Baum 1 Fu 3 Gold 1 Lu 1 Land 1 Lu 1 Land 3 Gold 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 0 0 0 0 4 2 3 2 (4)(3)2 6 (9)8 11 1 2 3 4 1 2 3 4 1 2 3 4 finalcounts "offsets" Original list Sorted list
Analysis: • O(k + n) time • What if k = O(n) • Requires k + n extra storage. • Stable sort: Preserves the original order of equal keys. • Is counting sort stable? • Is counting sort good for sorting records with 32 bit keys?
Radix sort • Radix sort is used in card sorting machines
Hollerith’s punched cards • Hollerith devised what was to become the computer punched card • Each card has 12 rows and 80 columns • Each column represents a single alphanumeric character or symbol. • The card punching machine punches holes in some of the 12 positions of each column
IBM card punching machine Card punching machine
Hollerith’s tabulating machines • As the cards were fed through a "tabulating machine," pins passed through the positions where holes were punched completing an electrical circuit and subsequently registered a value. • The 1880 census in the U.S. took seven years to complete • With Hollerith's "tabulating machines" the 1890 census took the Census Bureau six weeks • Through mergers company’s name - IBM
Card sorting machine IBM’s card sorting machine
Radixsort • Main idea • Break key into “digit” representation key = id, id-1, …, i2, i1 • "digit" can be a number in any base, a character, etc • Radix sort: for i= 1 to d sort “digit” i using a stable sort • Analysis : (d (stable sort time)) where d is the number of “digit”s
Radix sort • Which stable sort? • Since the range of values of a digit is small the best stable sort to use is Counting Sort. • When counting sort is used the time complexity is (d (n +k )) where k is the range of a "digit". • When k O(n), (d n)
Radix sort- 910 321 572 294 326 178 368 139 139 178 294 321 326 368 572 910 1 2 3 4 5 6 7 8 178 139 326 572 294 321 910 368 910 321 326 139 368 572 178 294 Sorted list Input list
Lemma 8.4 • Given nb-bit numbers and any positive integer r<=b, radix sort correctly sorts these numbers in ((b/r)(n + 2r)) • Proof • Divide the number into b/r “digits”. • Each “digit” has r bits and a range 0 to 2r-1. • Radix sort executes b/r counting sorts. • Each counting sort is (n + 2r) • So the total is ((b/r)(n + 2r))
Bucket sort • Assumption: Keys are distributed uniformly in interval [0, 1) • Main idea • n records are distributed into nbuckets (O(n)) • insert A[i] into list of B[nA[i]] • Buckets are sorted with insertion sort • Buckets are combined (O(n))
BUCKET-SORT(A) • n <- length[A] • for i<-1 to n • insert A[i] into list of B[nA[i]] • for i=0 to n – 1 • sort list B[i] with insertion sort • Concatenate the lists B[0], B[1], …, B[n-1] together in order
Bucket sort - example B B A .78 .17 .39 .26 .72 .94 .21 .12 .23 .68 1 2 3 4 5 6 7 8 9 10 / / / / / / / / 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 .12 .12 .17/ .17/ .21 .23 .23 .21 .26/ .26/ .39/ .39/ .68/ .68/ .78/ .72 .78/ .72 .94/ .94/ Step 2 sort Step 1 distribute
Analysis • Let nibe the number of elementsin B[i] • Let xij = I{A[j] falls in bucket i}
Analysis – E(ni2)= 2 – 1/n • What is the worst case run time of bucket sort?