1 / 26

Order Statistics

Order Statistics. Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill. Order Statistic. i th order statistic: i th smallest element of a set of n elements. Minimum: first order statistic. Maximum: n th order statistic.

mandel
Download Presentation

Order Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Order Statistics Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill

  2. Order Statistic • ithorder statistic:ithsmallest element of a set of n elements. • Minimum: first order statistic. • Maximum:nth order statistic. • Median: “half-way point” of the set. • Unique, when n is odd – occurs at i = (n+1)/2. • Two medians when n is even. • Lower median, at i = n/2. • Upper median, at i= n/2+1. • For consistency, “median” will refer to the lower median.

  3. Selection Problem • Selection problem: • Input: A set A of ndistinct numbers and a number i, with 1in. • Output: the element x A that is larger than exactly i – 1 other elements of A. • Can be solved in O(n lg n) time. How? • We will study faster linear-time algorithms. • For the special cases when i = 1 and i = n. • For the general problem.

  4. Minimum (Maximum) Minimum (A) 1. min A[1] 2. fori2tolength[A] 3. doifmin > A[i] 4. thenmin  A[i] 5. return min Maximumcan be determined similarly. • T(n) = (n). • No. of comparisons: n – 1. • Can we do better? Why not? • Minimum(A) has worst-case optimal # of comparisons.

  5. Problem Minimum (A) 1. min A[1] 2. fori2tolength[A] 3. doifmin > A[i] 4. thenmin  A[i] 5. return min • Average for random input: How many times do we expect line 4 to be executed? • X = RV for # of executions of line 4. • Xi = Indicator RV for the event that line 4 is executed on the ith iteration. • X = i=2..nXi • E[Xi] = 1/i. How? • Hence, E[X] = ln(n) – 1 = (lg n).

  6. Simultaneous Minimum and Maximum • Some applications need to determine both the maximum and minimum of a set of elements. • Example: Graphics program trying to fit a set of points onto a rectangular display. • Independent determination of maximum and minimum requires 2n – 2 comparisons. • Can we reduce this number? • Yes.

  7. Simultaneous Minimum and Maximum • Maintain minimum and maximum elements seen so far. • Process elements in pairs. • Compare the smaller to the current minimum and the larger to the current maximum. • Update current minimum and maximum based on the outcomes. • No. of comparisons per pair = 3. How? • No. of pairs  n/2. • For odd n: initialize min and max to A[1]. Pair the remaining elements. So, no. of pairs = n/2. • For even n: initialize min to the smaller of the first pair and max to the larger. So, remaining no. of pairs = (n – 2)/2 < n/2.

  8. Simultaneous Minimum and Maximum • Total no. of comparisons, C  3n/2. • For odd n:C = 3n/2. • For even n:C = 3(n – 2)/2 + 1 (For the initial comparison). = 3n/2 – 2 < 3n/2.

  9. General Selection Problem • Seems more difficult than Minimum or Maximum. • Yet, has solutions with same asymptotic complexity as Minimum and Maximum. • We will study 2 algorithms for the general problem. • One with expectedlinear-time complexity. • A second, whose worst-case complexity is linear.

  10. Selection in Expected Linear Time • Modeled after randomized quicksort. • Exploits the abilities of Randomized-Partition (RP). • RP returns the index k in the sorted order of a randomly chosen element (pivot). • If the order statistic we are interested in, i, equals k, then we are done. • Else, reduce the problem size using its other ability. • RP rearranges the other elements around the random pivot. • If i < k, selection can be narrowed down to A[1..k – 1]. • Else, select the (i – k)th element from A[k+1..n]. (Assuming RP operates on A[1..n]. For A[p..r], change k appropriately.)

  11. Randomized Quicksort: review Rnd-Partition(A, p, r) i := Random(p, r); A[r]  A[i]; x, i := A[r], p – 1; for j := p to r – 1 do if A[j]  x then i := i + 1; A[i]  A[j] fi od; A[i + 1]  A[r]; return i + 1 Quicksort(A, p, r) if p < r then q := Rnd-Partition(A, p, r); Quicksort(A, p, q – 1); Quicksort(A, q + 1, r) fi A[p..r] 5 A[p..q – 1] A[q+1..r] Partition 5  5  5

  12. Randomized-Select Randomized-Select(A, p, r, i) // select ith order statistic. 1. ifp = r 2. thenreturnA[p] 3. q Randomized-Partition(A, p, r) 4. k q – p + 1 5. ifi = k 6. then return A[q] 7. elseif i < k 8. thenreturn Randomized-Select(A, p, q – 1, i) 9. else return Randomized-Select(A, q+1, r, i – k)

  13. Analysis • Worst-case Complexity: • (n2) – As we could get unlucky and always recurse on a subarray that is only one element smaller than the previous subarray. • Average-case Complexity: • (n) – Intuition: Because the pivot is chosen at random, we expect that we get rid of half of the list each time we choose a random pivot q. • Why (n) and not (n lg n)?

  14. Average-case Analysis • Define Indicator RV’s Xk, for 1  k  n. • Xk = I{subarray A[p…q] has exactly k elements}. • Pr{subarray A[p…q] has exactly k elements} = 1/n for all k = 1..n. • Hence, E[Xk] = 1/n. • Let T(n) be the RV for the time required by Randomized-Select (RS) on A[p…q] of n elements. • Determine an upper bound on E[T(n)]. (9.1)

  15. Average-case Analysis • A call to RS may • Terminate immediately with the correct answer, • Recurse on A[p..q – 1], or • Recurse on A[q+1..r]. • To obtain an upper bound, assume that the ith smallest element that we want is always in the larger subarray. • RP takes O(n) time on a problem of size n. • Hence, recurrence for T(n) is: • For a given call of RS, Xk =1 for exactly one value of k, and Xk = 0 for all other k.

  16. Average-case Analysis (by linearity of expectation) (by Eq. (C.23)) (by Eq. (9.1))

  17. Average-case Analysis (Contd.) The summation is expanded • If n is odd, T(n – 1) thru T(n/2) occur twice and T(n/2) occurs once. • If n is even, T(n – 1) thru T(n/2) occur twice.

  18. Average-case Analysis (Contd.) • We solve the recurrence by substitution. • Guess E[T(n)] = O(n). Thus, if we assume T(n) = O(1) for n < 2c/(c – 4a), we have E[T(n)] = O(n).

  19. Selection in Worst-Case Linear Time • Algorithm Select: • Like RandomizedSelect, finds the desired element by recursively partitioning the input array. • Unlike RandomizedSelect, is deterministic. • Uses a variant of the deterministic Partition routine. • Partition is told which element to use as thepivot. • Achieves linear-time complexity in the worst case by • Guaranteeingthat the split is always “good” at each Partition. • How can a good split be guaranteed?

  20. Guaranteeing a Good Split • We will have a good split if we can ensure that the pivot is the median element or an element close to the median. • Hence, determining a reasonable pivot is the first step.

  21. Choosing a Pivot • Median-of-Medians: • Divide the n elements into n/5 groups. •  n/5 groups contain 5 elements each. 1 group contains n mod 5 < 5 elements. • Determine the median of each of the groups. • Sort each group using Insertion Sort. Pick the median from the sorted list of group elements. • Recursively find the median x of the n/5 medians. • Recurrence for running time (of median-of-medians): • T(n) = O(n) + T(n/5) + ….

  22. Algorithm Select • Determine the median-of-medians x (using the procedure on the previous slide.) • Partition the input array around x using the variant of Partition. • Let k be the index of x that Partition returns. • If k = i, then return x. • Else if i < k, then apply Select recursively to A[1..k–1] to find the ith smallest element. • Else if i > k, then apply Select recursively to A[k+1..n] to find the (i– k)th smallest element. (Assumption: Select operates on A[1..n]. For subarrays A[p..r], suitably change k. )

  23. Worst-case Split Arrows point from larger to smaller elements. n/5 groups of 5 elements each. Elements < x n/5th group of n mod 5 elements. Median-of-medians, x Elements > x

  24. Worst-case Split • Assumption: Elements are distinct. • At least half of the n/5 medians are greater than x. • Thus, at least half of the n/5 groups contribute 3 elements that are greater than x. • The last group and the group containing x may contribute fewer than 3 elements. Exclude these groups. • Hence, the no. of elements > x is at least • Analogously, the no. of elements < x is at least 3n/10–6. • Thus, in the worst case, Select is called recursively on at most 7n/10+6 elements.

  25. Recurrence for worst-case running time • T(Select)T(Median-of-medians) +T(Partition) +T(recursive call to select) • T(n)  O(n) + T(n/5) + O(n) + T(7n/10+6) = T(n/5) + T(7n/10+6) + O(n) • Assume T(n)  (1), for n  140. T(Median-of-medians) T(Partition) T(recursive call)

  26. Solving the recurrence • To show: T(n) = O(n)  cn for suitable c and all n > 0. • Assume:T(n)  cn for suitable c and all n  140. • Substituting the inductive hypothesis into the recurrence, • T(n)  c n/5 + c(7n/10+6)+an  cn/5 + c + 7cn/10 + 6c + an = 9cn/10 + 7c + an = cn +(–cn/10 + 7c + an)  cn, if –cn/10 + 7c + an  0. • n/(n–70) is a decreasing function of n. Verify. • Hence, c can be chosen for any n = n0 > 70, provided it can be assumed that T(n) = O(1) for n  n0. • Thus, Select has linear-time complexity in the worst case. –cn/10 + 7c + an  0  c  10a(n/(n – 70)), when n > 70. For n  140, c  20a.

More Related