1 / 34

Order Statistics

Order Statistics. Order Statistic. i th order statistic: i th smallest element of a set of n elements. Minimum: first order statistic. Maximum: n th order statistic. Median: “half-way point” of the set. Unique, when n is odd – occurs at i = ( n+ 1)/2.

jkeeney
Download Presentation

Order Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Order Statistics Comp 550, Spring 2015

  2. Order Statistic • ithorder statistic:ithsmallest element of a set of n elements. • Minimum: first order statistic. • Maximum:nth order statistic. • Median: “half-way point” of the set. • Unique, when n is odd – occurs at i = (n+1)/2. • Two medians when n is even. • Lower median, at i = n/2. • Upper median, at i= n/2+1. • For consistency, “median” will refer to the lower median. Comp 550

  3. Selection Problem • Selection problem: • Input: A set A of ndistinct numbers and a number i, with 1in. • Ouput: the element x A that is larger than exactly i – 1 other elements of A. • Can be solved in O(n lg n) time. How? • We will study linear-time algorithms. • For the special cases when i = 1 and i = n. • For the general problem. Comp 550

  4. Minimum (Maximum) Minimum (A) 1. min A[1] 2. fori2tolength[A] 3. doifmin > A[i] 4. thenmin  A[i] 5. return min Maximumcan be determined similarly. • T(n) = (n). • No. of comparisons: n – 1. • Can we do better? Why not? • Minimum(A) has worst-case optimal # of comparisons. Comp 550

  5. Selection Problem Minimum (A) 1. min A[1] 2. fori2tolength[A] 3. doifmin > A[i] 4. thenmin  A[i] 5. return min • Average for random input: How many times do we expect line 4 to be executed? • X = RV for # of executions of line 4. • Xi = Indicator RV for the event that line 4 is executed on the ith iteration. • X = i=2..nXi • E[Xi] = 1/i. How? • Hence, E[X] = ln(n) – 1 = (lg n). Comp 550

  6. Simultaneous Minimum and Maximum • Bounding box: compute minimum and maximum simultaneously • Separately, each require n – 1 comparisons, • Together, can we use fewer than 2n – 2 comparisons? • Yes: process elements in pairs: 3n/2 comparisons. • For each pair, find smaller&larger with one comparison, • Compare smaller to min, larger to max • At a cost of 3 comparisons for every 2 elements • Key idea: structure your data Comp 550

  7. Selection Problem • Selection problem: • Input: A set A of ndistinct numbers and a number i, with 1in. • Ouput: the element x A that is larger than exactly i – 1 other elements of A. • Seems harder than min or max, but has the same asymptotic running time. • QuickMedian: expected linear time (randomized) • BFPRT73: linear time (deterministic) • Key idea: geometric series sum to linear Comp 550

  8. Randomized Quicksort: review Rnd-Partition(A, p, r) i := Random(p, r); A[r]  A[i]; x, i := A[r], p – 1; for j := p to r – 1 do if A[j]  x then i := i + 1; A[i]  A[j] fi od; A[i + 1]  A[r]; return i + 1 Quicksort(A, p, r) if p < r then q := Rnd-Partition(A, p, r); Quicksort(A, p, q – 1); Quicksort(A, q + 1, r) fi A[p..r] 5 A[p..q – 1] A[q+1..r] Partition 5  5  5 Comp 550

  9. Selection in Expected Linear Time • Key idea: Quicksort, but recur only on list containing i • Exploit two abilities of Randomized-Partition (RP). • RP returns the index k in the sorted order of a randomly chosen element (pivot). • If the order statistic i = k, then we are done. • Else reduce the problem size using its other ability. • RP rearranges all other elements around the pivot. • If i < k, selection can be narrowed down to A[1..k – 1]. • Else, select the (i – k)th element from A[k+1..n]. Comp 550

  10. Randomized-Select Randomized-Select(A, p, r, i) // select ith order statistic. 1. ifp = r 2. thenreturnA[p] 3. q Randomized-Partition(A, p, r) 4. k q – p + 1 5. ifi = k 6. then return A[q] 7. elseif i < k 8. thenreturn Randomized-Select(A, p, q – 1, i) 9. else return Randomized-Select(A, q+1, r, i – k) Comp 550

  11. Randomized-Select Example • Goal: Find 3rd smallest element M. C. Lin

  12. Randomized-Select Example M. C. Lin

  13. Randomized-Select Example M. C. Lin

  14. Randomized-Select Example M. C. Lin

  15. Randomized-Select Example M. C. Lin

  16. Randomized-Select Example M. C. Lin

  17. Randomized-Select Example M. C. Lin

  18. Analysis • Worst-case Complexity: • (n2) – As we could get unlucky and always recurse on a subarray that is only one element smaller than the previous subarray. (T(n) = T(n-1) + (n) ) • Average-case Complexity: • (n) – Intuition: Because the pivot is chosen at random, we expect that we get rid of half of the list each time we choose a random pivot q. • Why (n) and not (n lg n)? Comp 550

  19. Analysis • Average-case Complexity - more intuition • If we get rid of 10% of the list each time we choose a random pivot q... • T(n) = T(9/10 n) + (n) Comp 550

  20. Analysis • Average-case Complexity - more intuition • If we get rid of 10% of the list each time we choose a random pivot q... • T(n) = T(9/10 n) + (n) • Let a  1 and b > 1 ,T(n) = a T(n/b) + c nk, n  0 • 1. If a > bk, then T(n) = ( nlog_b a ). • 2. If a = bk, then T(n) = ( nklg n ). • 3. If a < bk, then T(n) = ( nk ). Comp 550

  21. Analysis • Average-case Complexity - more intuition • If we get rid of 10% of the list each time we choose a random pivot q... • T(n) = T(9/10 n) + (n) • T(n) = (n) • Let a  1 and b > 1 ,T(n) = a T(n/b) + c nk, n  0 • 1. If a > bk, then T(n) = ( nlog_b a ). • 2. If a = bk, then T(n) = ( nklg n ). • 3. If a < bk, then T(n) = ( nk ). Comp 550

  22. Average-case Analysis • A call to RS may • Terminate immediately with the correct answer, • Recurse on A[p..q – 1], or • Recurse on A[q+1..r]. • To obtain an upper bound, assume that the ith smallest element that we want is always in the larger subarray. • RP takes O(n) time on a problem of size n. • Hence, recurrence for T(n) is: Comp 550

  23. Solving the recurrence Comp 550

  24. Average-case Analysis (Contd.) The summation is expanded • If n is odd, T(n – 1) thru T(n/2) occur twice and T(n/2) occurs once. • If n is even, T(n – 1) thru T(n/2) occur twice. Comp 550

  25. Average-case Analysis (Contd.) • We solve the recurrence by substitution. • Guess T(n) = O(n). Thus, if we assume T(n) = O(1) for n < 2c/(c – 4a), we have E[T(n)] = O(n). Comp 550

  26. Selection in Worst-Case Linear Time • Algorithm Select: • Like RandomizedSelect, finds the desired element by recursively partitioning the input array. • Unlike RandomizedSelect, is deterministic. • Uses a variant of the deterministic Partition routine. • Partition is told which element to use as thepivot. • Achieves linear-time complexity in the worst case by • Guaranteeingthat the split is always “good” at each Partition. • How can a good split be guaranteed? Comp 550

  27. Choosing a Pivot • Median-of-Medians: • Divide the n elements into n/5 groups. • n/5 groups contain 5 elements each. 1 group may contain n mod 5 < 5 elements. • Determine the median of each of the groups. • Recursively find the median x of the n/5 medians. n/5 groups of 5 elements each. n/5th group of n mod 5 elements. Comp 550

  28. Example Z. Guo

  29. Algorithm Select • Determine the median-of-medians x(on previous slide.) • Partition input array around x (Partition from Quicksort). • Let k be the index of x that Partition returns. • If k = i, then return x. • Else if i < k, apply Select recursively to A[1..k–1] to find the ith smallest element. • Else if i > k, apply Select recursively to A[k+1..n] to find the (i– k)th smallest element. Comp 550

  30. Worst-case Split Arrows point from larger to smaller elements. n/5 groups of 5 elements each. Elements < x n/5th group of n mod 5 elements. Median-of-medians, x Elements > x Comp 550

  31. Worst-case Split • Assumption: Elements are distinct. Why? • At least  n/5 /2 groups have 3 of their 5 elements ≥ x. • Ignore the last group if it has fewer than 5 elements. • Hence, the no. of elements ≥ x is at least 3(n–4)/10. • Likewise, the no. of elements ≤ x is at least 3(n–4)/10. • Thus, in the worst case, Select is called recursively on at most (7n+12)/10 elements. Comp 550

  32. Recurrence for worst-case running time • T(Select)T(Median-of-medians) + T(Partition) + T(recursive call to select) • T(n)  O(n) + T(n/5) + O(n) + T((7n+12)/10) = T(n/5) + T(7n/10+1.2) + O(n) • Base: for n  24, assume we just use Insertionsort. • So T(n)  24n for all n  24. T(Median-of-medians) T(Partition) T(recursive call) Comp 550

  33. Solving the recurrence • Base: for all n  24, T(n)  24n • For n > 24, T(n) ≤ an+ T(n/5) + T(7n/10+1.2) • We want to find c>0 so for all n>0 T(n) ≤ cn. • Base implies c ≥ 24 • T(n) ≤ an+ T(n/5) + T(7n/10+1.2)?≤ an+ c n/5 + c 7n/10 + 1.2c= cn – (c n/10 –an – 1.2c)= cn – ((c/20 –a)n + (n/20 – 1.2)c)≤ cn, as long as c ≥ 20a. • So, c = max(24, 20a) works Comp 550

  34. Conclusions • We can find the ith largest in an unordered list in Θ(n) worst-case time • Let’s us do Quicksort in worst-case Θ(n lg n) . • That constant, 20× partition cost, was high; use RandomizedSelect (aka QuickMedian) in practice. Comp 550

More Related