1 / 42

Data Structures and Algorithms

Data Structures and Algorithms. Dr. Ken Cosh Sorting. Review. Recursion Tail Recursion Non-tail Recursion Indirect Recursion Nested Recursion Excessive Recursion. This week. Sorting Algorithms The efficiency of many algorithms depends on the way data is stored.

dellad
Download Presentation

Data Structures and Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Structures and Algorithms Dr. Ken Cosh Sorting

  2. Review • Recursion • Tail Recursion • Non-tail Recursion • Indirect Recursion • Nested Recursion • Excessive Recursion

  3. This week • Sorting Algorithms • The efficiency of many algorithms depends on the way data is stored. • Most data structures can be improved by ordering the data structure in some way. • This week we will investigate different approaches to sorting data.

  4. Sorting Criteria • The first step is to decide what to sort by; • numerical • alphabetical • ASCII codes

  5. Algorithm Measures • Best Case • Often when data already sorted • Worst Case • Data completely disorganised, or in reverse order. • Average Case • Random order • Some sorting algorithms are the same for all three cases, others can be tailored to suit certain cases.

  6. Sorting Algorithms • Sorting Algorithms are generally comprised of ‘comparisons’ and ‘movements’. • If the data being sorted is complicated (strings for example), then the comparison part can be more expensive. • If the data items are large, then the movement part can be more expensive.

  7. Elementary Sorting Algorithms • First lets investigate 3 elementary sorting algorithms; • Insertion Sort • Selection Sort • Bubble Sort

  8. Insertion Sort • Insertion sort starts with the first two elements, and compares them; • a[0] < a[1] ? • If necessary it will swap them. • Next the 3rd element is compared and positioned, and then the 4th etc.

  9. Insertion Sort template<class T> void insertionsort(T data[], int n) { for (int i = 1, j; i < n; i++) { T tmp = data[i]; for (j=i; j<0 && tmp < data[j-1]; j--) data[j] = data[j-1]; data[j] = tmp; } }

  10. Insertion Sort • The insertion method only sorts the part of the array that needs to be sorted – if the array is already sorted, less processing is required. • Best Case when data is already in order, 1 comparison per position (n-1), or O(n). • Worst Case is when data is in reverse order and ever, in which case comparisons are required for both the outer and the inner loop, O(n2) • How about Average Case? The result should be somewhere between O(n) and O(n2), but which is it closer to?

  11. Insertion Sort – Average Case • The number of iterations of the outer loop depends on how far the number is away from where it should be; • it could loop 0 times if it is where it should be, or i times if it needs to be inserted at the start. • Therefore on average it should be; 0+1+2+…+i / i = (i+1)/2 • This is the average per case, which needs to be multiplied by the number of numbers (n), leaving O(n2), sadly closer to the worst case than best.

  12. Selection Sort • Selection Sort takes each value and moves it to its final place. • i.e. it first searches for the smallest value and moves it to the start, then the next smallest and moves that to position 2.

  13. Selection Sort template<class T> void selectionsort(T data[], int n) { for (int i = 0,j,least; i<n-1; i++) { for (j = i+1, least = i; j < n; j++) if (data[j] < data[least]) least = j; swap(data[least], data[i]); } } • Once again this algorithm relies on a double loop, so O(n2)

  14. BubbleSort • Imagine a glass of beer… • The bubbles slowly rise to the surface. • Now imagine those bubbles are the data to be sorted, with the lowest values rising to the surface. • That’s how Bubblesort works!

  15. BubbleSort • Starting with the other end of the dataset, a[n-1], each value is compared with its predecessor, a[n-2], and if it is less it is swapped. • Next a[n-2] is compared with a[n-3], and if necessary swapped, and so on. • Therefore at the end of the first loop through the array, the smallest value is in position a[0], but many of the other values have been bubbled towards their final location. • The next loop through will move values closer still.

  16. BubbleSort template<class T> void bubblesort(T data[], int n) { for (int i=0; i<n-1; i++) for (int j=n-1; j>I; --j) if (data[j] < data[j-1]) swap(data[j], data[j-1]); }

  17. BubbleSort • With Bubble Sort there are the same number of comparisons in every case (best, worst and average) • Sadly that number of comparisons is; (n(n-1))/2, or O(n2)

  18. Cocktail Sort • A variation of the BubbleSort is called Cocktail Sort. • Here the bubbles move in both directions alternatingly. As though a cocktail was being shaken. First lower values are bubbled towards the start, then higher values are bubbled towards the end.

  19. Elementary Sorting Algorithms • Sorry… but O(n2) just isn’t good enough anymore!

  20. Efficient Sorting Algorithms • Shell Sort • Heap Sort • Quick Sort • Merge Sort • Radix Sort

  21. ShellSort • One problem with the basic sorting algorithms is that their runtime grows faster than the size of the array. • Therefore one way to solve this problem is to perform the sort on smaller / limited subarray parts of the array. • ShellSort (named after Donald Shell), breaks the array into subarrays, sorts them, and then sorts the whole array.

  22. ShellSort • Shellsort is actually slightly more complex than that, because decisions need to be made about the size of each subarray. • Normally the Sort is performed on a decreasing number of sub arrays, with the final sort being performed on the whole array (the sub arrays growing in size each Sort). • Note that the actual Sort part can use any of the Elementary Sorting Algorithms already discussed.

  23. Shell Sort Sample • Original Data Set 10, 15, 2, 9, 12, 4, 19, 5, 8, 0, 11 • 5 subarray Sort 10, 4, 11  4,10,11 15, 19  15, 19 2, 5  2, 5 9, 8  8, 9 12, 0  0, 12 • After 5 subarray Sort 4, 15, 2, 8, 0, 10, 19, 5, 9, 12, 11

  24. Shell Sort Sample cont. 4, 15, 2, 8, 0, 10, 19, 5, 9, 12, 11 • 3 subarray Sort 4, 8, 19, 12  4, 8, 12, 19 15, 0, 5, 11  0, 5, 11, 15 2, 10, 9  2, 9, 10 • After 3 subarray Sort 4, 0, 2, 8, 5, 9, 12, 11, 10, 19, 15 • Final Sort 0, 2, 4, 5, 8, 9, 10, 11, 12, 15, 19

  25. Shell Sort • In the Sample, I arbitrarily chose sub array sizes of 1, 3 and 5. • In reality that increment isn’t ideal, but then it isn’t clear what increment is. • From empirical studies, a suggested increment follows this pattern; • h1 = 1 • hi+1 = 3hi+1 • until ht+2≥n • What sequence will this give?

  26. Shell Sort • Shell Sort is shown to be better than O(n2), but how much depends on the increments. • With Shell Sort different Elementary Sorting Algorithms can affect the efficiency of the ShellSort, but its not agreed on how. • A favourite is surely the “Cocktail Shaker Sort”, which uses iterations of the bubble sort, finishing off with a insertion sort – as the cherry is inserted into the cocktail!

  27. Heap Sort • To investigate how Heap Sort works, we need to investigate a Heap further. • A heap is like a binary tree, with 2 different properties; • First the value of each node is not less than the values in its children (i.e. the highest value is the root). • Second the tree is perfectly balanced.

  28. Heap Sort • Heap Sort makes use of a heap, and is a variation of Selection Sort. • The weakness of selection Sort was the number of comparisons required – because of the improved searchability of a tree, heap sort reduces the comparisons made during selection sort. • First a heap is created, then the root is moved to the end of the dataset, before a new heap is created. • Therefore the complexity of the algorithm mirrors the complexity of creating a heap.

  29. Heap Sample Remove root node, and make new heap from 2 heaps 15 12 6 11 10 2 3 1 8

  30. Heap Sort • Rather than moving the root to a new array, heapsort uses the same memory space and swaps the root value with the last leaf node (8). • 8 is then be moved down the tree, swapping with the larger child until it reaches its new base (i.e. swap with 12 and then 11). • The new root can then be swapped with the next lowest node (1). • And the process repeated, which in the worst case is O(n lg n).

  31. QuickSort • Quicksort uses a recursive approach, which is similar to the approach we used when guessing a number between 1 and 10. • Quicksort shares with Shellsort the ideal of breaking the problem down into subproblems that are easier to solve. • Quicksort divides the values into 2 subarrays – those lower than value ‘x’ and those higher than value ‘x’. • The 2 subarrays can then be sorted – guess what… using QuickSort.

  32. Quick Sort {10, 15, 2, 9, 12, 4, 19, 5, 8, 0, 11} {2, 0} 4 {10, 15, 9, 12, 19, 5, 8, 11} {0, 2} 4 {10, 9, 5, 8, 11} 12 {15, 19} 0, 2, 4, {}5{10, 9, 8, 11} 12 {15,19} 0, 2, 4, 5 {8}9{10, 11} 12, 15, 19 1, 2, 4, 5, 8, 9, 10, 11, 12, 15, 19

  33. QuickSort • The problem remains which value to chose as the pivot or bound value? • Obviously if we choose a value too near to either end of the dataset we will have an unbalanced split. • We could arbitrarily choose the first value, but some arrays are already partly sorted, therefore we could arbitrarily choose the middle value.

  34. QuickSort Pseudocode Quicksort(array[]) if length(array)>1 choose bound; while there are elements in array include element in either subarray1 or subarray2; quicksort(subarray1); quicksort(subarray2);

  35. QuickSort • Best Case – when the array divides into even size parts; n + 2n/2 + 4n/4 + 8n/8 + … …+nn/n = O(n lg n) • Worst Case – when one array is always size 1; • O(n2) • Average case is in between. Ideally if we can avoid the worst case, but to do that we need to pick a good pivot value; • A better approach is to choose the median of the first, last and middle values

  36. MergeSort • The problem with QuickSort is still picking a good pivot value. • MergeSort combats this by concentrating on merging two already sorted subarrays. • The merge process can be quite straight forward assuming the two subarrays are already sorted. • We can sort the subarrays recursively using MergeSort!

  37. Merge Sort Mergesort(data) If data has at least 2 elements Mergesort(left half); Mergesort(right half); merge(both halves);

  38. merge merge(array1, array2, array3) int i1 = 0, i2 = 0, i3 = 0; while both array2 and array 3 have elements if array2[i2] < array3[i3] array1[i1++] = array2[i2++]; else array1[i1++] = array3[i3++]; add the remains of either array2 or array3 into array1;

  39. MergeSort {10, 15, 2, 9, 12, 4, 19, 5, 8, 0, 11} {10, 15, 2, 9, 12, 4} {19, 5, 8, 0, 11} {10, 15, 2}{9, 12, 4} {19, 5, 8}{0, 11} {10,15}{2}{9,12}{4} {19,5}{8}{0,11} {2, 10, 15}{4, 9, 12} {5, 8, 19}{0, 11} {2, 4, 9, 10, 12, 15} {0, 5, 8, 11, 19} {0, 2, 4, 5, 8, 9, 10, 11, 12, 15, 19}

  40. MergeSort • The MergeSort algorithm is O(n lg n), but has a major drawback due to the space required to store multiple arrays (in the run time stack). • An iterative version while visually more complex is a little more efficient space wise. • Also using a linked list rather than an array can counter the space issue.

  41. Radix Sorting • Radix Sorting mirrors a real life approach to sorting. • Given a selection of books, we might first make a pile of all the ‘A’ author books, all the ‘B’ author books etc. Then we would sort each pile. • Or, if given a series of 10 digit numbers, we might first sort according to the first digit, and then according to the second digit etc.

  42. Comparing approaches • Take a look at the table on page 511 to see how the different sorting algorithms compare.

More Related