Sorting

Sorting 15-211 Fundamental Data Structures and Algorithms Peter Lee February 20, 2003

Announcements • Homework #4 is out • Get started today! • Reading: • Chapter 8 • Quiz #2 available on Tuesday

Objects in calendar are closer than they appear.

Introduction to Sorting

Comparison-based sorting • We assume • Items are stored in an array. • Can be moved around in the array. • Can compare any two array elements. • Comparison has 3 possible outcomes: • < = >

flip inversion Flips and inversions An unsorted array. 24 47 13 99 105 222

24 47 13 99 105 222 13 47 13 24 Naïve sorting algorithms • Bubble sort. Keep scanning for flips, until all are fixed What is the running time?

13 13 13 47 105 105 47 47 30 47 47 105 99 105 13 13 13 47 99 99 99 105 99 99 105 30 30 30 30 30 222 222 222 222 222 222 Insertion sort Sorted sublist

Insertion sort for i = 2 to n do insert a[i] in the proper place in a[1:i-1]

How fast is insertion sort? Takes O(#inversions) steps, which is very fast if array is nearly sorted to begin with. 3 2 1 6 5 4 9 8 7 …

How many inversions? • Consider the worst case: • n n-1 … 3 2 1 0 • In this case, there are • n + (n-1) + (n-2) + … + 1 • or

How many inversions? What about the average case? Consider: p = x1 x2 x3 … xn-1 xn For any such p, let rev(p) be its reversal. Then (xi,xj) is an inversion either in p or in rev(p). There are n(n-1)/2 pairs (xi,xj), hence the average number of inversions in a permutation is n(n-1)/4, or O(n2).

How long does it take to sort? • Can we do better than O(n2)? • In the worst case? • In the average case

Heapsort • Remember heaps: • buildHeap has O(n) worst-case running time. • deleteMin has O(log n) worst-case running time. • Heapsort: • Build heap. O(n) • DeleteMin until empty. O(n log n) • Total worst case: O(n log n)

N2 vs Nlog N

Sorting in O(n log n) • Heapsort establishes the fact that sorting can be accomplished in O(n log n) worst-case running time.

Heapsort in practice • The average-case analysis for heapsort is somewhat complex. • In practice, heapsort consistently tends to use nearly n log n comparisons. • So, while the worst case is better than n2, other algorithms sometimes work better.

Shellsort Shellsort, like insertion sort, is based on swapping inverted pairs. It achieves O(n4/3) running time. [See your book for details.]

99 105 30 99 30 99 30 30 99 47 13 47 13 99 13 13 13 13 105 99 105 105 105 105 47 47 30 30 47 47 222 222 222 222 222 222 Several inverted pairs fixed in one exchange. Shellsort • Example with sequence 3, 1. ...

Recursive Sorting

Recursive sorting • Intuitively, divide the problem into pieces and then recombine the results. • If array is length 1, then done. • If array is length N>1, then split in half and sort each half. • Then combine the results. • An example of a divide-and-conquer algorithm.

Divide-and-conquer

Why divide-and-conquer works • Suppose the amount of work required to divide and recombine is linear, that is, O(n). • Suppose also that the amount of work to complete each step is greater than O(n). • Then each dividing step reduces the amount of work by greater than a linear amount, while requiring only linear work to do so.

Divide-and-conquer is big • We will see several examples of divide-and-conquer in this course.

Recursive sorting • If array is length 1, then done. • If array is length N>1, then split in half and sort each half. • Then combine the results.

Analysis of recursive sorting • Suppose it takes time T(N) to sort N elements. • Suppose also it takes time N to combine the two sorted arrays. • Then: • T(1) = 1 • T(N) = 2T(N/2) + N, for N>1 • Solving for T gives the running time for the recursive sorting algorithm.

Remember recurrence relations? • Systems of equations such as • T(1) = 1 • T(N) = 2T(N/2) + N, for N>1 • are called recurrence relations (or sometimes recurrence equations).

A solution • A solution for • T(1) = 1 • T(N) = 2T(N/2) + N • is given by • T(N) = Nlog N + N • which is O(Nlog N). • How to solve such equations?

Recurrence relations • There are several methods for solving recurrence relations. • It is also useful sometimes to check that a solution is valid. • This is done by induction.

Checking a solution • Base case: • T(1) = 1log 1 + 1 = 1 • Inductive case: • Assume T(M) = Mlog M + M, all M<N. • T(N) = 2T(N/2) + N

Checking a solution • Base case: • T(1) = 1log 1 + 1 = 1 • Inductive case: • Assume T(M) = Mlog M + M, all M<N. • T(N) = 2T(N/2) + N • = 2((N/2)(log(N/2))+N/2)+N • = N(log N - log 2)+2N • = Nlog N - N + 2N • = Nlog N + N

Logarithms • Some useful equalities. • xA = B iff logxB = A • log 1 = 0 • log2 2 = 1 • log(AB) = log A + log B, if A, B > 0 • log(A/B) = log A - log B, if A, B > 0 • log(AB) = Blog A

Upper bounds for rec. relations • Divide-and-conquer algorithms are very useful in practice. • Furthermore, they all tend to generate similar recurrence relations. • As a result, approximate upper-bound solutions are well-known for recurrence relations derived from divide-and-conquer algorithms.

a=2, b=1, c=2 for recursive sorting Divide-and-Conquer Theorem • Theorem: Let a, b, c  0. • The recurrence relation • T(1) = b • T(N) = aT(N/c) + bN • for any N which is a power of c • has upper-bound solutions • T(N) = O(N) if a<c • T(N) = O(Nlog N) if a=c • T(N) = O(Nlogca) if a>c

Upper-bounds • Corollary: • Dividing a problem into p pieces, each of size N/p, using only a linear amount of work, results in an O(Nlog N) algorithm.

Upper-bounds • Proof of this theorem later in the semester.

Exact solutions • Recall from earlier in the semester that it is sometimes possible to derive closed-form solutions to recurrence relations. • Several methods exist for doing this. • As an example, consider again our current equations: • T(1) = 1 • T(N) = 2T(N/2) + N, for N>1

Repeated substitution method • One technique is to use repeated substitution. • T(N) = 2T(N/2) + N • 2T(N/2) = 2(2T(N/4) + N/2) • = 4T(N/4) + N • T(N) = 4T(N/4) + 2N • 4T(N/4) = 4(2T(N/8) + N/4) • = 8T(N/8) + N • T(N) = 8T(N/8) + 3N • T(N) = 2kT(N/2k) + kN

Repeated substitution, cont’d • We end up with • T(N) = 2kT(N/2k) + kN, for all k>1 • Let’s use k=log N. • Note that 2log N = N. • So: • T(N) = NT(1) + Nlog N • = Nlog N + N

Other methods • There are also other methods for solving recurrence relations. • For example, the “telescoping method”…

Telescoping method • We start with • T(N) = 2T(N/2) + N • Divide both sides by N to get • T(N)/N = (2T(N/2) + N)/N • = T(N/2)/(N/2) + 1 • This is valid for any N that is a power of 2, so we can write the following:

Telescoping method, cont’d • Additional equations: • T(N)/N = T(N/2)/(N/2) + 1 • T(N/2)/(N/2) = T(N/4)/(N/4) + 1 • T(N/4)/(N/4) = T(N/8)/(N/8) + 1 • … • T(2)/2 = T(1)/1 + 1 • What happens when we sum all the left-hand and right-hand sides?

Telescoping method, cont’d • Additional equations: • T(N)/N = T(N/2)/(N/2) + 1 • T(N/2)/(N/2) = T(N/4)/(N/4) + 1 • T(N/4)/(N/4) = T(N/8)/(N/8) + 1 • … • T(2)/2 = T(1)/1 + 1 • We are left with: • T(N)/N = T(1)/1 + log(N)

Telescoping method, cont’d • We are left with • T(N)/N = T(1)/1 + log(N) • Multiplying both sides by N gives • T(N) = N log(N) + N

Mergesort

Mergesort • Mergesort is the most basic recursive sorting algorithm. • Divide array in halves A and B. • Recursively mergesort each half. • Combine A and B by successively looking at the first elements of A and B and moving the smaller one to the result array. • Note: Should be a careful to avoid creating of lots of result arrays.

Mergesort

Mergesort But don’t actually want to create all of these arrays!

Sorting

Sorting

Presentation Transcript

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting