1 / 17

Chapter 9: Median and Order Statistics Overview: What is an order statistic?

Chapter 9: Median and Order Statistics Overview: What is an order statistic? Selection means finding a particular order statistic Selection by sorting Selection in linear time best case worst case average case. What is an order statistic?.

quinn-rosa
Download Presentation

Chapter 9: Median and Order Statistics Overview: What is an order statistic?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 9: Median and Order Statistics Overview: What is an order statistic? Selection means finding a particular order statistic Selection by sorting Selection in linear time best case worst case average case

  2. What is an order statistic? Given a set of n elements, ith order statistic = ith smallest element Generalization of min (1st order statistic) and max (nth order statistic) parity of a set is whether n is even or odd median is roughly half way between min and max unique for an odd parity set ith order statistic with i = (n+1)/2 regardless of parity lower median means that i = (n+1)/2 upper median means that i = (n+1)/2

  3. Selection problem Find the ith order statistic in set of n (distinct) elements input: set A = <a1, a2,...,an> and i such that 1< i <n output: element x A that is larger than exactly i –1 other elements of A Selection problem can be solve in O(nlgn) by sorting Faster solutions are possible Can min and max be found in linear time? Section 9.2 presents an algorithm that is O(n) in average case Section 9.3 discusses an algorithm that is O(n) in worst case.

  4. Select by partition pseudocode Select-by-Partition(A,p,r,i) 1 if p=r then return A[p] (single element is ith smallest by default) 2 q  Partition(A,p,r) (get upper and lower sub-arrays) 3 k  q – p + 1 (number of elements in lower including pivot) 4 if i = k then 5 return A[q] (pivot is the ith smallest element) 6 else 7 if i < k then return Select-by-Partition(A,p,q-1,i) 8 else 9 return Select-by-Partition(A,q+1,r,i - k) Note: index of ith order statistic changed in upper sub-array With favorable splits, T(n) = O(n) Why not O(nlg(n)) as in quicksort?

  5. Selection algorithm with worst-case runtime = O(n) Possible to design a deterministic selection algorithm that has a linear worst-case runtime. Making the pivot an input parameter, can guarantee a good split when partition is called Processing before calling partition determines a good choice for pivot.

  6. Outline of recursive Select with worst-case runtime = O(n): Step 1: Divide n-element sequence into subgroups of at most 5 elements (one group may have less than 5 elements) cost = Q(n) Step 2: Use insertion sort to find median of each subgroup cost = constant times number of subgroups = Q(n) Step 3: Use Select to find the median of the medians cost = T(n/5) Step 4: Partition the input array with pivot = median of medians Calculate the number of elements in the lower sub-array cost = Q(n) + constant Step 5: If pivot is not the ith smallest element, use Select to find it in either the upper or lower sub-array cost < T(7n/10 + 6) run time to select from larger sub-array

  7. Diagram to help explain cost of Step 5 Dots represent elements of input. Subgroups of 5 occupy columns Arrows point from larger to smaller elements. Medians are white. x marks median of medians. Shaded area shows elements greater than x 3 out of 5 in shade if subgroup is full and does not contain x

  8. At least 3[(1/2)(n/5) – 2] elements larger than x At most {n - 3[(1/2)(n/5) – 2]} = 7n/10 +6 elements less than x Worst case described by T(n) = T(n/5) + T(7n/10 + 6) + Q(n) Solve by substitution method

  9. CptS 450 Spring 2014 [All problems are from Cormen et al, 3rd Edition] Homework Assignment 8: due 4/9/14 1. ex 9.3-1 p 223 2. ex 9.3-3 p 223 3. ex 9.3-5 p 223 On problems 2 and 3, Write a pseudo code (variation of code in text) Explain how code works Analyze its run time

  10. Randomized-Select lets us analyze the runtime for the average case Randomized-Select(A,p,r,i) 1 if p=r then return A[p] 2 q  Randomized-Partition(A,p,r) 3 k  q – p + 1 4 if i = k then 5 return A[q] (pivot is the ith smallest element) 6 else 7 if i < k then return Randomized-Select(A,p,q-1,i) 8 else 9 return Randomized-Select(A,q+1,r,i –k) As in Randomized-Quicksort, Randomized-Partition chooses a pivot at random from array elements between p and r

  11. Upper bound on the expected value of T(n) for Randomized-Select Call to Randomized-Partition creates upper and lower sub-arrays Include the pivot in lower sub-array A(p..q) Define indicator random variables Xk = I{sub-array A[p...q]} has exactly k elements} 1 < k < n All possibilities values of k are equally likely. E[Xk] = 1/n

  12. Assume that the desired element always falls in larger partition This assumption ensures an upper bound on E(T(n)) T(n) < {Xk T(max(k-1,n-k))} + O(n) Sum contains only one nonzero term T(n) = T(n-1) + O(n) when lower sub-array has 1 element T(n) = T(n-2) + O(n) when lower sub-array has 2 element . . . T(n) = T(n-2) + O(n) when lower sub-array has n-1 element T(n) = T(n-1) + O(n) when lower sub-array has n element

  13. E[T(n)] < { E[Xk T(max(k-1,n-k))] } + O(n) (linearity of expected values) E[T(n)] < { E[Xk] E[ T(max(k-1,n-k))] } + O(n) (independence of random variables, exercise 9.2-2) E[T(n)] < (1/n) E[ T(max(k-1,n-k))] + O(n) (using E[Xk] = 1/n)

  14. E[T(n)] < (1/n)E[T(max(k-1,n-k))] + O(n) if k > n/2, max(k-1,n-k) = k-1 if k <n/2, max(k-1,n-k) = n-k For even n, each term from T(n/2) to T(n-1) occurs exactly twice Similar argument applies for odd n E[T(n)] < (2/n) E[ T(k)] + O(n) (using the redundancy of T’s) E[T(n)] < (2/n) { E[ T(k)] - E[ T(k)] } + O(n) (Get setup to use the arithmetic sum)

  15. Apply substitution method: assume E[T(k)] = O(k) Then exist c > 0 such that E[T(k)] < ck E[T(n)] < (2c/n) { k - k} + dn d>0 Now use arithmetic sum After much algebra (text p219) E[T(n)] < cn – (cn/4 – c/2 – dn) Find c and n0

  16. simplify (see text p219) E[T(n)] < cn – (cn/4 – c/2 – dn) E[T(n)] < cn if (cn/4 – c/2 – dn) > 0 n(c/4 –d) > c/2 If c > 4d, n sufficient large does exist If c = 8d, than n > 4

More Related