Deterministic Selection and Sorting

Deterministic Selection and Sorting Analysis of Algorithms Prepared by John Reif, Ph.D.

Readings • Main Reading Selections: • CLR, Chapters 2, 8, 9

Deterministic Selection and Sorting: • Selection Algorithms and Lower Bounds • Sorting Algorithms and Lower Bounds

Problem P size n • Divide into sub problems size n1, …, nk solve these and “glue” together solutions Time to combine solutions

Examples • 1st lecture’s mult • Fast fourier transform • Binary search • Merge sorting

Selection, and Sorting on Decision Tree Model • Input a, b, c

Time • Time = (# comparisons on longest path) • Binary tree with L Leaves • Facts: • (1) has = L – 1 internal nodes • (2) max height

Merging • 2 lists with total of n keys • Input • X1 < X2 < … < Xk and • Y1 < Y2 < … < Yn-k • Output • Ordered merge of two key lists

Goal • Provably asymptotically optimal algorithm in Decision Tree Model • Use this Model because it • Allows simple proofs of lower bounds • time = # comparisons so easy to bound time costs

Algorithm Insert • Input (X1 < X2 < … < Xk), (Y1) • Case k = n – 1 • Algorithm: Binary Search by Divide-and-Conquer • (1) Compare • (2) If insert Y1 into • else and insert Y1 into

Total Comparison Cost: • Since a binary tree with n = k + 1 leaves has depth > , this is optimal!

Case: Merging equal length lists • Input • (X1 < X2 < … < Xk) • (Y1 < Y2 < … < Yn-k) • Where

Algorithm • i  1, j  1 • while i < k and j < k do if Xi < Yj then output(Xi) and set i  i + 1 else output (Yj) and set j  j + 1 • output remaining elements Algorithm clearly uses 2k – 1 = n – 1 comparisons

Lower Bound: • Consider case X1 < Y1 < X2 < Y2 < …< Xk < Yk • any merge algorithm must compare Claim: Xi with Yi for i = 1,…, k Yi with Xi+1 for j = 1,…, k-1 Otherwise we could flip Yi < Xi with no change)  so requires > 2k-1 = n-1 comparisons!

Sorting by Divide-and-Conquer Algorithm Merge Sort Input set S of n keys • Partition S into set x of keys and set Y of keys

Sorting by Divide-and-Conquer (cont’d) • Recursively compute Merge Sort Merge Sort • Merge above sequences using n-1 comparisons • Output merged sequence

Time Analysis

Lower Bounds on Sorting • (on decision tree model) n! distinct leaves

Easy Approximation (via Integration)

A Better Bound • Using Sterling Approximation

Selection Problems • Input X1 , X2 , … , Xn and index k  {1, …, n} • Output X(k) = the k’th best

History • Rev C. L. Dodge (Lewis Carol) wrote article on lawn tennis tournament in James Gazett, 1883 • Felt prizes unjust because • although winner X(1) always gets 1st prize • second X(2) may not get 2nd prize

History (cont’d) • Carol proposed his own (nonoptimal) tournament…

Selection of the Champion X(1) • X(1) is easily determined in n-1 comparison • X(1) requires n-1 comparisons x1

Selection of the Champion X(1) (cont’d) • Proof • Everyone except the champion X(1) must lose at least once!

Selection of the Second Best X(2) • Using comparisons • Algorithm • form a balanced binary tree for tournament to find X(1) using n-1 comparisons

Selection of the Second Best X(2) (cont’d)

Selection of the Second Best X(2) (cont’d) • Let Y be the set of players knocked out by champion X(1) • Play a tournament among the Y’s • output X(2) = champion of the Y’s using more comparisons

Lower Bounds on Finding X(2) • Requires comparisons • Proof • # comparisons > m1 + m2 + … • Where mi = # players who lost i or more matches

Lower Bounds on Finding X(2) (cont’d) • Claim • m1> n -1, since at end we must know X(1) as well as X(2) • Claim • m2> (# who lost to X(1)) -1, since everyone (except X(2)) who lost to X(1) must also have lost one more time.

Lower Bounds on Finding X(2) (cont’d) • lemma (# who lost to X(1)) > in worst case • proof • Use oracle who “fixes” results of games so that champion X(1) plays > matches

Lower Bounds on Finding X(2) (cont’d) • Declare Xi > Xj if • Xi previously undefeated and Xj lost at least once • Both undefeated but Xi played more matches • Otherwise, decide consistently with previous decisions

Selection by Divide-and-Conquer • Algorithm • Select k (X) • Input • set X of n keys and index k

Deriving Recurrence for Time Bounds • Use a sufficiently large constant c1 (assuming d is constant) • If say d = 5, T(n) < 20 c1 n = O(n)

Deriving Recurrence for Time Bounds

Lower Bounds for Selecting X(k) • Input • X = {x1, …, xn}, index k • Theorem • Every leaf of Decision Tree has depth > n - 1

Proof of Lower Bound for Selecting X(k) • Fix a path p from root to leaf • The comparisons done on p define a relation Rp • Let Rp+ = transitive closure of Rp

Lemma • If path p determines Xm = X(k) then for all im either xi Rp+ xm or xmRp+xi • Proof • Suppose xi is unrelated to xm by Rp+ • They can replace xi in linear order either before or after xm to violate xm=x(k)

Let the “key” comparison for xi be when xi is compared with xj where either • j=m • xiRpxj and xjRp+ xm, or • xjRpxi and xmRp+ xj • Fact • Xi has unique “key” comparison determining either xiRp+ xm or xmRp+ xi  So there are n-1 “key” comparisons, each distinct!

RADIXSORT: Linear Time Sort for RAM • Assume keys over integers 1,…,n • Costs O(n) time on unit cost RAM • Avoids (n log n) lower bound on SORT by avoiding comparisons instead uses indexing of RAM • Generalizes (in c passes) to key domains {1, …, nc}

Procedure RADIXSORT • Input • for j = 1, …, n doinitialize B[j] to be the empty list • for I=1, …, n doadd I to B[xi] • Let L = (i1, i2, …, in) be the concatenation of B[1],…,B[n] • output

Deterministic Selection and Sorting Analysis of Algorithms Prepared by John Reif, Ph.D.

Deterministic Selection and Sorting