# When is O( n lg n ) Really O( n lg n )? A Comparison of the Quicksort and Heapsort Algorithms - PowerPoint PPT Presentation

1 / 13

When is O( n lg n ) Really O( n lg n )? A Comparison of the Quicksort and Heapsort Algorithms Gerald Kruse Juniata College kruse@juniata.edu Huntingdon, PA Outline Analyzing Sorting Algorithms Quicksort Heapsort Experimental Results

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

When is O( n lg n ) Really O( n lg n )? A Comparison of the Quicksort and Heapsort Algorithms

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## When is O(n lg n)ReallyO(n lg n)? A Comparison of the Quicksort and Heapsort Algorithms

Gerald Kruse

Juniata College

kruse@juniata.edu

Huntingdon, PA

### Outline

• Analyzing Sorting Algorithms

• Quicksort

• Heapsort

• Experimental Results

• Observations(this is a fun, open-ended, student project)

## How Fast is my Sorting Algorithm?“A nice blend of Math and CS”

The Sorting Problem, from Cormen et. al. Input: A sequence of n numbers, (a1, a2, … an)

Output: A permutation (reordering) (a1’, a2’, … an’) of the input sequence such that a1’ ≤ a2’ ≤ … ≤ an’Note: This definition can be expanded to include sorting primitive data such as characters or strings, alpha-numeric data, and data records with key values.

Sorting algorithms are analyzed using many different metrics: expected run-time, memory usage, communication bandwidth, implementation complexity, …

Expected running time is given using “Big-O” notation

O( g(n) ) = { f(n): pos. constants c and n0 s.t. 0 ≤ f(n) ≤ c*g(n) n≥n0 }.

While O-notation describes an asymptotic upper bound on a function, it is frequently used to describe asymptotically tight bounds.

## Algorithm analysis also requires a model of the implementation technology to be used

The most commonly used model is RAM, the Random-Access Machine.

This should NOT be confused with Random-Access Memory.

Each instruction requires an equal amount of processing time

Memory hierarchy (cache, virtual memory) is NOT modeled

The RAM model is relatively straightforward and “usually an excellent predictor of performance on actual machines.”

## Quicksort

“Good” partitioning means the partitions are usually equally sized

After a partition, the element partitioned around will be in the correct position

There are n compares per level, and log(n) levels, resulting in an algorithm that should run proportionally to n * lg n, taking the assumptions of the RAM model

## Quicksort

Pathological data leads to “bad” or unbalanced partitions and the worst-case for Quicksort

The element partitioned around will be in sorted position

This data will be sorted in O(n2) time, since there are still n compares per level, but now there are n -1 levels.

16

14

10

8

7

9

3

2

4

1

### Heaps

A heap can be seen as a complete binary tree:

In practice, heaps are usually implemented as arrays.

A =

16

14

10

8

7

9

3

2

4

1

16

14

10

8

7

9

3

2

4

1

### Heaps, continued

Heaps satisfy the heap property:

A[Parent(i)]  A[i]for all nodes i > 1

In other words, the value of a node is at most the value of its parent.

By the way, e-Bay uses a “heap-like” data structure to track bids.

## Heapsort

Heapsort(A)

{

BuildHeap(A);

for (i = length(A) downto 2)

{

Swap(A[1], A[i]);

heap_size(A) -= 1;

Heapify(A, 1);

}

}

When the heap property is violated at just one node (which has sub-trees which are valid heaps), Heapify “floats down” the parent node to fix the heap. Remembering the tree structure of the heap, each Heapify call takes O(lg n) time.

Since there are n – 1 calls to Heapify, Heapsort’s expected execution time is O(n lg n), just like Quicksort.

## Timing Results

Implementation

Run on Windows and Unix based machines, implemented in C, C++, and Java, and based on psuedo-code from: Cormen et. al., Sedgewick, and Joyce et. al.

Heapsort does not run in O(n lg n) timeeven for the relatively small values of n tested

Quicksort does exhibit O(n lg n) behavior

Consider the memory access patternsFor very large n, we would expect a slowdown for ANY algorithm as the data no longer fits in memoryFor the size n run here, the partitions in Quicksort consist of elements which are contiguous in memory, while “floating down” a Heap requires accessing elements which are not close in memory

This is a fun exploration for students, appealing to those with an interest in the mathematics or computer science

Observations

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, “Introduction to Algorithms, Second Edition,” Cambridge, MA/London, England: The MIT Press/McGraw-Hill, 2003.

N. Dale, C. Weems, D. T. Joyce, “Object-Oriented Data Structures Using Java,” Boston, MA: Jones and Bartlett, 2002.

M. T. Goodrich and R. Tamassia, “Algorithm Design: Foundation, Analysis, and Internet Examples,” Wiley: New York: 2001.

D. E. Knuth, “The Art of Computer Programming, Volume 3: (Second Edition) Sorting and Searching,” Addison-Wesley-Longman: Redwood City, CA, 1998.

C. C. McGeoch, “Analyzing algorithms by simulation: Variance reduction techniques and simulation speedups,” ACM Computing Surveys, vol. 24, no. 2, pp. 195 – 212, 1992.

C. C. McGeoch, D. Precup, and P. R. Cohen, “How to find the Big-Oh of your data set (and how not to),” Advances in Intelligent Data Analysis, vol. 1280 of Lecture Notes in Computer Science, pp. 41 – 52, Springer-Verlag, 1997.

R. Sedgewick, “Algorithms in C, Parts 1-4: Fundamentals, Data Structures, Sorting, Searching, Third Edition,” Addison-Wesley: Boston, MA, 1997

Bibliography