1 / 44

A look at Sorting

A look at Sorting. Why Sorting?. It comes up all the time It can bottlenose an app in terms of either time or space There’s a lot of sorting algorithms – rich & interesting as a problem We can prove a lower bounds!. Analyzing sorts. Running time Space – is it in-place?

landis
Download Presentation

A look at Sorting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A look at Sorting

  2. Why Sorting? • It comes up all the time • It can bottlenose an app in terms of either time or space • There’s a lot of sorting algorithms – rich & interesting as a problem • We can prove a lower bounds!

  3. Analyzing sorts • Running time • Space – is it in-place? • Stable: do equal elements get shifted?

  4. Insertion Sort • Main idea: Insert each element into its proper place in sorted order. • A[0] is sorted by itself • Then consider A[1]. Swap with first element if necessary. • Then consider A[2]. Put into place with first 2. • Etc… • After ith pass, first i elements are sorted. They are the same first i elements from the unsorted set.

  5. Insertion Sort • O(n2) sort • Stable • Is especially bad if elements are in reverse sorted order • Already sorted order?

  6. Selection Sort • Main idea: find the smallest, then the next smallest, etc. • When you find smallest, swap it with A[o] • After ith pass, A[0] – A[i] has the correct sorted elements.

  7. Selection Sort • O(n2) • Stable • Does it perform better/worse for already sorted input? Reverse-sorted input?

  8. Bubble Sort • Main idea: Go through each element, and if it’s out of order with its neighbor, swap them. • If you can go through entire array with no swaps, you’re done • After ith pass, at least last i positions are sorted and in final correct order.

  9. Bubble Sort • O(n2) • Worst case time has a big constant • What if already sorted? • “In short, the bubble sort seems to have nothing to recommend it, except a catchy name and the fact that it leads to some interesting theoretical problems.” Don Knuth, The Art of Computer Programming: Vol. 3, Sorting and Searching

  10. Merge Sort • Divide-and-conquer • If there is only 0 or 1 element, it’s done • Otherwise, recursively sort ½ the set • Then merge them together (O(n))

  11. Merge Sort • O(nlgn) • No real best or worst case • Can be made to be in-place

  12. Heap Sort • Main idea: Create a heap out of the elements. • Inserting each element takes time lgn • Delete biggest and re-heapify (also lgn)

  13. Heap Sort • O(nlgn) • In-place • Quite a bit of shuffling memory

  14. Quicksort • Main idea: • Find a Pivot element • Split array into elements less than pivot, equal to pivot, and greater than pivot, called partitioning • Recursively sort the pieces

  15. Divide and Conquer 1. Pick a pivot element 2. Put everything <pivot on the left and everything > pivot on right. 3. Sort the left and right x x L R P x David Luebke 152/28/12

  16. Quicksort Code Quicksort(A, p, r) if (p < r) q = Partition(A, p, r); Quicksort(A, p, q); Quicksort(A, q+1, r); David Luebke 162/28/12

  17. Partition • Clearly, all the action takes place in the partition() function • Rearranges the subarray in place • End result: • Two subarrays • All values in first subarray  all values in second • Returns the index of the “pivot” element separating the two subarrays • How do you suppose we implement this? David Luebke 172/28/12

  18. In-Place Partitioning • Perform the partition using two indices to split S into L, E and G. • Repeat until l and r cross: • Scan l to the right until finding an element > p. • Scan r to the left until finding an element < p. • Swap elements r l (pivot = 6) 3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 96 l r 3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 9 6 David Luebke 182/28/12

  19. Partitioning 3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 9 6 3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 9 6 3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 9 6 3 2 5 1 0 2 3 5 9 7 7 9 8 9 7 9 6 Stop when they cross! 3 2 5 1 0 2 3 5 9 7 7 9 8 9 7 9 6 Put the pivot in place 3 2 5 1 0 2 3 5 6 7 7 9 8 9 7 9 9 David Luebke 192/28/12

  20. Partition • Partition(A[l..r], j): • pivot=A[r] • i = l; j=r+1 • repeat • repeat i = i+1 until A[i]>=pivot • repeat j = j-1 until A[j] <= pivot • swap(A[i], A[j]) • until i >= j • swap(A[i],A[j]) to undo the last one when i>=j • swap (A[j],A[r]) • Return j David Luebke 202/28/12

  21. Analyzing Partition • For an array of n items, what is the most number of items that partition looks at? • Easier to think of the algorithm than to look at code • Partition: linear David Luebke 212/28/12

  22. Analyzing Quicksort • Partition • 2 recursive calls – how many elements are in the recursive calls? • Depends on how the pivot works out! • Best case: pivot is in the perfect middle • Worst case: pivot is at an extreme David Luebke 222/28/12

  23. Worst-case Running Time • The worst case for quick-sort occurs when the pivot is the unique minimum or maximum element • One partition has size n - 1 and the other has size 0 • T(n) = T(1) + T(n-1) + O(n) … David Luebke 232/28/12

  24. Analyzing Quicksort • In the worst case: T(1) = (1) T(n) = T(n - 1) + (n) => T(n) = (n2) • In the best case:T(1) = (1) T(n) = 2T(n/2) + (n) => T(n) = O(nlgn) David Luebke 242/28/12

  25. Analyzing Quicksort • What will be the worst case for the algorithm? • Partition is always unbalanced • What will be the best case for the algorithm? • Partition is perfectly balanced • Which is more likely? • The latter, by far, except... • Will any particular input elicit the worst case? • Yes: Already-sorted input David Luebke 252/28/12

  26. Analyzing Quicksort: Average Case • Assuming random input, average-case running time is much closer to O(nlgn) than O(n2) • First, a more intuitive explanation/example: • Suppose that partition() always produces a 9-to-1 split. This looks quite unbalanced! • The recurrence is thus: T(n) = T(9n/10) + T(n/10) + n • Still comes out to O(nlgn) David Luebke 262/28/12

  27. Analyzing Quicksort: Average Case • Intuitively, a real-life run of quicksort will produce a mix of “bad” and “good” splits • Randomly distributed among the recursion tree • Pretend for intuition that they alternate between best-case (n/2 : n/2) and worst-case (n-1 : 1) • What happens if we bad-split root node, then good-split the resulting size (n-1) node? David Luebke 272/28/12

  28. Analyzing Quicksort: Average Case • Intuitively, a real-life run of quicksort will produce a mix of “bad” and “good” splits • Randomly distributed among the recursion tree • Pretend for intuition that they alternate between best-case (n/2 : n/2) and worst-case (n-1 : 1) • What happens if we bad-split root node, then good-split the resulting size (n-1) node? • We fail English David Luebke 282/28/12

  29. Analyzing Quicksort: Average Case • Intuitively, a real-life run of quicksort will produce a mix of “bad” and “good” splits • Randomly distributed among the recursion tree • Pretend for intuition that they alternate between best-case (n/2 : n/2) and worst-case (n-1 : 1) • What happens if we bad-split root node, then good-split the resulting size (n-1) node? • We end up with three subarrays, size 1, (n-1)/2, (n-1)/2 • Combined cost of splits = n + n -1 = 2n -1 = O(n) • No worse than if we had good-split the root node! David Luebke 292/28/12

  30. Analyzing Quicksort: Average Case • Intuitively, the O(n) cost of a bad split (or 2 or 3 bad splits) can be absorbed into the O(n) cost of each good split • Thus running time of alternating bad and good splits is still O(n lg n), with slightly higher constants • How can we be more rigorous? David Luebke 302/28/12

  31. Analyzing Quicksort: Average Case • For simplicity, assume: • All inputs distinct (no repeats) • Slightly different partition() procedure • partition around a random element, which is not included in subarrays • all splits (0:n-1, 1:n-2, 2:n-3, … , n-1:0) equally likely • What is the probability of a particular split happening? • Answer: 1/n David Luebke 312/28/12

  32. Analyzing Quicksort: Average Case • So partition generates splits (0:n-1, 1:n-2, 2:n-3, … , n-2:1, n-1:0) each with probability 1/n • If T(n) is the expected running time, • What is each term under the summation for? • What is the (n) term for? David Luebke 322/28/12

  33. Analyzing Quicksort: Average Case • So… David Luebke 332/28/12

  34. Analyzing Quicksort: Average Case • We can solve this recurrence using the dreaded substitution method • Guess the answer • Assume that the inductive hypothesis holds • Substitute it in for some value < n • Prove that it follows for n • Randomized David Luebke 342/28/12

  35. Improving Quicksort • The real liability of quicksort is that it runs in O(n2) on already-sorted input • Book discusses two solutions: • Randomize the input array, OR • Pick a random pivot element • How will these solve the problem? • By insuring that no particular input can be chosen to make quicksort run in O(n2) time David Luebke 352/28/12

  36. Quicksort • O(nlgn) • Worst case O(n2) • Fine-tune: • Random pivot point takes away worst case • If array portion nearly sorted call insert-sort • When array portion becomes small call insert-sort

  37. How Fast Can We Sort? • First, an observation: all of the sorting algorithms so far are comparison sorts • The only operation used to gain ordering information about a sequence is the pairwise comparison of two elements • Comparisons sorts must do at least n comparisons (why?) • What do you think is the best comparison sort running time?

  38. Decision Trees • Abstraction of any comparison sort. • Represents comparisons made by • a specific sorting algorithm • on inputs of a given size. • Abstracts away everything else: control and data movement. • We’re counting only comparisons. • Each node is a pair of elements being compared • Each edge is the result of the comparison (< or >=) • Leaf nodes are the sorted array

  39. Insertion Sort 4 Elements as a Decision Tree Compare A[1] and A[2] Compare A[2] and A[3]

  40. The Number of Leaves in a Decision Tree for Sorting Lemma: A Decision Tree for Sorting must have at least n! leaves.

  41. Lower Bound For Comparison Sorting • Thm: Any decision tree that sorts n elements has • height (n lg n) • If we know this, then we know that comparison sorts are always (n lg n) • Consider a decision tree on n elements • We must have at least n! leaves • The max # of leaves of a tree of height h is 2h

  42. Lower Bound For Comparison Sorting • So we have… n!  2h • Taking logarithms: lg (n!)  h • Stirling’s approximation tells us: • Thus:

  43. Lower Bound For Comparison Sorting • So we have • Thus the minimum height of a decision tree is (nlgn)

  44. Lower Bound For Comparison Sorts • Thus the time to comparison sort n elements is (nlgn) • Corollary: Heapsort, Quicksortand Mergesort are asymptotically optimal comparison sorts • How can we do better than (nlgn)?

More Related