1 / 42

CSE 326: Data Structures: Sorting

CSE 326: Data Structures: Sorting. Lecture 13: Wednesday, Feb 5, 2003. Today. Finish extensible hash tables Sorting Will take several lectures Read Chapter 7 ! Except Shellsort (7.4). Hash Tables on Secondary Storage (Disks). Main differences:

laksha
Download Presentation

CSE 326: Data Structures: Sorting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 326: Data Structures: Sorting Lecture 13: Wednesday, Feb 5, 2003

  2. Today • Finish extensible hash tables • Sorting • Will take several lectures • Read Chapter 7 ! • Except Shellsort (7.4)

  3. Hash Tables on Secondary Storage (Disks) Main differences: • One bucket = one block, hence may hold multiple keys • Open chaining: use overflow blocks when needed • Closed chaining never used

  4. Hash Table Example • Assume 1 bucket (block) stores 2 keys + pointers • h(e)=0 • h(b)=h(f)=1 • h(g)=2 • h(a)=h(c)=3 0 1 2 3

  5. Searching in a Hash Table • Search for a: • Compute h(a)=3 • Read bucket 3 • 1 disk access 0 1 2 3

  6. Insertion in Hash Table • Place in right bucket, if space • E.g. h(d)=2 0 1 2 3

  7. Insertion in Hash Table • Create overflow block, if no space • E.g. h(k)=1 • More over-flow blocksmay be needed 0 1 2 3

  8. Hash Table Performance • Excellent, if no overflow blocks • Degrades considerably when number of keys exceeds the number of buckets (I.e. many overflow blocks).

  9. Extensible Hash Table • Allows has table to grow, to avoid performance degradation • Assume a hash function h that returns numbers in {0, …, 2k – 1} • Start with n = 2i << 2k , only look at first i most significant bits

  10. Extensible Hash Table • E.g. i=1, n=2i=2, k=4 • Note: we only look at the first bit (0 or 1) 1 0 1 1

  11. Insertion in Extensible Hash Table • Insert 1110 1 0 1 1

  12. Insertion in Extensible Hash Table • Now insert 1010 • Need to extend table, split blocks • i becomes 2 1 0 1 1

  13. Insertion in Extensible Hash Table 1 00 01 2 10 11 2

  14. Insertion in Extensible Hash Table • Now insert 0000, then 0101 • Need to split block 1 00 01 2 10 11 2

  15. Insertion in Extensible Hash Table • After splitting the block 2 2 00 01 2 10 11 2

  16. Extensible Hash Table • How many buckets (blocks) do we need to touch after an insertion ? • How many entries in the hash table do we need to touch after an insertion ?

  17. Performance Extensible Hash Table • No overflow blocks: access always O(1) • More precisely: exactly one disk I/O • BUT: • Extensions can be costly and disruptive • After an extension table may no longer fit in memory

  18. Sorting • Perhaps the most common operation in programs • The authoritative text: • D. Knuth, The Art of Computer Programming, Vol. 3

  19. Material to be Covered • Sorting by comparision: • Bubble Sort • Selection Sort • Merge Sort • QuickSort • Efficient list-based implementations • Formal analysis • Theoretical limitations on sorting by comparison • Sorting without comparing elements • Sorting and the memory hierarchy

  20. Bubble Sort Idea • We want A[1]  A[2]  …  A[N] • Bubble sort idea: • If A[i-1] > A[i] then swap A[i-1] and A[i] • Do this for i = 1, …, n-1 • Repeat this until it’s sorted

  21. Bubble Sort procedure BubbleSort (Array A, int N) repeat { isSorted = true; for (i=1 to N-1) { if ( A[i-1] > A[i] ){ swap( A[i-1], A[i] ); isSorted = false; } until isSorted

  22. Bubble Sort Improvements • After the 1st iteration: • largest element  A[n-1] • After the 2nd iteration: • Second largest element  A[n-2] • Question: what is the max number of iterations, and, hence the worst case running time ? • Improvement: stop the iterations earlier: • for (i=1 to N-1) • for (i=1 to N-2) • ... • for (i=1 to 1) • In fact we may be lucky, and be able decrease i more aggresively

  23. Bubble Sort procedure BubbleSort (Array A, int N) m = N; repeat { newM = 1; for (i=1 to m-1) { if ( A[i-1] > A[i] ){ swap( A[i-1], A[i] ); newM = i-1; } m = newM; while m > 1

  24. Bubble Sort • So the worst-case running time is T(n) = O(n2) • Is the worst-case running time also (n2) ? • You need to find a worst-case input of size n for which the running time is n2.

  25. Selection Sort procedure SelectSort (Array A, int N) for (i=0 to N-2) { /* find the minimum among A[i],...,A[n-1] */ /* place it in A[i] */ m = i; for (j=i+1 to N-1) if ( A[m] > A[j] ) m = j; swap(A[i], A[m]); }

  26. Selection Sort • Worst case running time: • T(n) = O( ?? ) • T(n) = ( ?? )

  27. Insertion Sort procedure InsertSort (Array A, int N) for (i=1 to N-1) { /* A[0], A[1], ..., A[i-1] is sorte */ /* now insert A[i] in the right place */ x = A[i]; for (j=i-1; j>0 && A[j] > x; j--) A[j+1] = A[j]; A[j] = x; }

  28. Insertion Sort • Worst case running time: • T(n) = O( ?? ) • T(n) = ( ?? )

  29. Merge Sort The Merge Operation: given two sorted sequences: A[0]  A[1]  ...  A[m-1] B[0]  B[1]  ...  B[n-1] Construct another sorted sequence that is their union Merge (A[0..m-1],B[0..n-1]) i1=0, i2=0 Whilei1<m, i2<n IfT1[i1] < T2[i2] Next is T1[i1] i1++ Else Next is T2[i2] i2++ End If End While Merging Cars by key [Aggressiveness of driver]. Most aggressive goes first. Photo from http://www.nrma.com.au/inside-nrma/m-h-m/road-rage.html

  30. Merge Sort Function MergeSort (Array A[0..n-1]) if n  1 return A Merge(MergeSort(A[0..n/2-1]), MergeSort(A[n/2..n-1]))

  31. Merge Sort Running Time Any difference best / worse case? T(1) = b T(n) =2T(n/2) + cn for n>1 T(n) = 2T(n/2)+cn T(n) = 4T(n/4) +cn +cn substitute T(n) = 8T(n/8)+cn+cn+cn substitute T(n) = 2kT(n/2k)+kcn inductive leap T(n) = nT(1) + cn log n where k = log n select value for k T(n) = (n log n) simplify

  32. Merge Sort • Works great with lists, or files • Problems with arrays: • We need a scratch array, cannot sort ‘in situ’

  33. Heap Sort • Recall: a heap is a tree where the min is at the root • A heap is stored in an array A[1], ..., A[n]

  34. Heap Sort • Start with an unsorted array A[1], ..., A[n] • Build a heap • How much time does it take ? • Get minimum, store in out array; repeat n times:

  35. Heap Sort • But then we need an extra array ! • How can we do it ‘in situ’ ?

  36. Heap Sort • Input: unordered array A[1..N] • Build a max heap (largest element is A[1]) • For i = 1 to N-1: A[N-i+1] = Delete_Max() 7 50 22 15 4 40 20 10 35 25 50 40 20 25 35 15 10 22 4 7 40 35 20 25 7 15 10 22 4 50 35 25 20 22 7 15 10 4 40 50

  37. Properties of Heap Sort • Worst case time complexity O(n log n) • Build_heap O(n) • n Delete_Max’s for O(n log n) • In-place sort – only constant storage beyond the array is needed

  38. QuickSort Picture from PhotoDisc.com • Pick a “pivot”. • Divide list into two lists: • One less-than-or-equal-to pivot value • One greater than pivot • Sort each sub-problem recursively • Answer is the concatenation of the two solutions

  39. QuickSort: Array-Based Version Pick pivot: Partition with cursors < > 2 goes to less-than < >

  40. QuickSort Partition (cont’d) 6, 8 swap less/greater-than < > 3,5 less-than 9 greater-than Partition done.

  41. QuickSort Partition (cont’d) Put pivot into final position. 5 2 6 3 7 9 8 Recursively sort each side. 2 3 5 6 7 8 9

  42. QuickSort Complexity • QuickSort is fast in practice, but has (N2) worst-case complexity • Friday we will see why

More Related