Chapter 6

Chapter 6 Sorting

Sorting • A file of size n is a sequence of n items. Each item in the file is called a record. sorted file original file

Sorted pointer table Original pointer table File Record 1 Record 2 Record 3 Record 4 Record 5 • It is easier to search a particular element after sorting. (e.g. binary search)

Types of sorting • internal sorting: data stored in main memory ( more than 20 algorithms ) • external sorting: data stored in auxiliary storage. • stable sorting : the records with the same key have the same relative order as they have before sorting.

(a+b) n2 1.01 0.21 0.11 0.03 0.02 0.01 0.01 0.01 0.01 0.01 n 10 50 100 500 1,000 5,000 10,000 50,000 100,000 500,000 b = 10n 100 500 1,000 5,000 10,000 50,000 100,000 500,000 1,000,000 5,000,000 a = 0.01n2 1 25 100 2,500 10,000 250,000 1,000,000 25,000,000 100,000,000 2,5000,000,000 a+b 101 525 1,100 7,500 20,000 300,000 1,100,000 25,500,000 101,000,000 2,505,000,000 Time and space efficiency

O notation f(n) is O(g(n)) if there exist positive integers a and b such that f(n) ≦a． g(n) for all n ≧b e.g. 4n2 + 100n = O(n2) ∵ n ≧100, 4n2+100n ≦5n2 4n2 + 100n = O(n3) ∵ n ≧10, 4n2+100n ≦2n3 f(n)= c1nk + c2nk-1 +…+ ckn + ck+1 = O(nk+j), for any j ≧0 f(n)= c = O(1), c is a constant logmn = logmk． logkn ,for some constants m and k logmn = O(logkn) = O(logn)

Time complexity • polynomial order: O(nk), for some constant k. • exponential order: O(dn), for some d >1. • NP-complete(intractable) problem: • requiring exponential time algorithms. • best sorting algorithm with comparisons: O(nlogn)

n 1 × 101 5 × 101 1 × 102 5 × 102 1 × 103 5 × 103 1 × 104 5 × 104 1 × 105 5 × 105 1 × 106 5 × 106 1 × 107 nlog10n 1.0 × 101 8.5 × 101 2.0 × 102 1.3 × 103 3.0 × 103 1.8 × 104 4.0 × 104 2.3 × 105 5.0 × 105 2.8 × 106 6.0 × 106 3.3 × 107 7.0 × 107 n2 1.0 × 102 2.5 × 103 1.0 × 104 2.5 × 105 1.0 × 106 2.5 × 107 1.0 × 108 2.5 × 109 1.0 × 1010 2.5 × 1011 1.0 × 1012 2.5 × 1013 1.0 × 1014 nlog10n and n2

Bubble sort • 相鄰兩個資料相比, 若未符合順序, 則對調(exchange)之. e.g.82 9 5 6(由大而小sort) decreasing order 8 29 5 6 pass 1 8 9 2 5 6 nonincreasing order 8 9 5 2 6 8 9 5 6 2 pass 2 9 8 5 6 2 9 8 5 6 2 9 8 6 5 2 pass 3 9 86 5 2 9 8 6 5 2

n(n-1) 2 Time complexity of bubble sort • 如果在某一個pass中，沒有任何相鄰兩項資料對調，表示已經sort完畢 • best case : 未sort之前, 已按順序排好, 需 1 pass • worst case: 需 n-1個 pass( n為資料量) • 比較(comparison)次數最多為: • (n-1)+(n-2)+...+1 = = O(n2) • Time complexity: O(n2)

void bubble(int x[], int n) { /* nonincreasing order */ int hold, j, pass; int switched = TRUE; for (pass=0; pass < n-1 && switched == TRUE; pass++){ /*outer loop controls the number of passes */ switched = FALSE; /* initially no interchanges have */ /* been made on this pass */ for (j = 0; j < n-pass-1; j++) /* inner loop governs each individual pass */ if (x[j] < x[j+1]){ /* elements out of order */ /* an interchange is necessary */ switched = TRUE; hold = x[j]; x[j] = x[j+1]; x[j+1] = hold; } /* end if */ } /* end for */ } /* end bubble */

Quicksort (partition exchange sort) e.g. 由小而大sort (nondecreasing order) [26 5 37 1 61 11 59 15 48 19] [26 5 19 1 61 11 59 15 48 37] [26 5 19 1 15 11 59 61 48 37] [11 5 191 15] 26 [59 61 48 37] [11 5 1 19 15] 26 [59 61 48 37] [ 1 5] 11 [1915] 26 [59 61 48 37] 1 5 11 15 19 26 [59 61 48 37] 1 5 11 15 19 26 [59 37 48 61] 1 5 11 15 19 26 [4837] 59 [61] 1 5 11 15 19 26 37 48 59 61

Quicksort • 方法: 每組的第一個資料為基準(pivot),比它小的資料放在左邊, 比它大的資料放在右邊, 然後以pivot中心, 將這組資料分成兩部份. • worst case: • 每次的基準資料恰為最大, 或最小 • 比較次數:

n n ×2 = n 2 log2n n ×4 = n 4 ... ... ... Best case of quicksort • best case: • 每次分割(partition)時, 均分成大約相同個數的兩部份.

Mathematical analysis of best case • T(n): n個資料所需時間 • T(n)≦ cn+2T(n/2), for some constant c. • ≦ cn+2(c．n/2 + 2T(n/4)) • ≦ 2cn + 4T(n/4) • ≦ cnlog2n + nT(1) = O(nlogn) ...

void partition(int x[], int lb, int ub, int *pj) { int a, down, temp, up; a = x[lb]; /* a is the element whose final position */ /* is sought */ up = ub; down = lb; while (down < up){ while (x[down] <= a && down < ub) down++; /* move up the array */ while (x[up] > a) up--; /* move down the array */ if (down < up){ /* interchange x[down] and x[up] */ temp = x[down]; x[down] = x[up]; x[up] = temp; } /* end if */ } /* end while */ x[lb] = x[up]; x[up] = a; *pj = up; } /* end partition */

main program: /* nondecreasing order */ if (lb >= ub) return; // array is sorted partition(x, lb, ub, j); // partition the elements of the // subarray such that one of the // elements(possibly x[lb]) is // now at x[j] (j is an output // parameter) and: // 1. x[i] <= x[j] for lb <= i < j // 2. x[i] >= x[j] for j < i <= ub // x[j] is now at its final position quick(x, lb, j-1); // recursively sort the subarray // between posiitons lb and j-1 quick(x, j+1, ub); // recursively sort the subarray // between positions j+1 and ub

方法: 每次均從剩餘未 sort部份之資料, 找出最大者(或最 • 小者), 然後對調至其位置 • 比較次數: n(n-1) (n-1)+(n-2)+...+1 = =O(n2) Time complexity: O(n2) 2 Selection sort e.g.由大而小 sort 8 2 9 5 6 pass 1 9 28 5 6 pass 2 9 8 2 5 6 pass 3 9 8 6 5 2 pass 4 9 8 6 5 2

Binary tree sort e.g. input data: 8, 2, 9, 5, 6 建立 binary search tree: 8 2 9 inorder traversal: output: 2 5 6 8 9 5 6 • worst case: • input data: 2, 5, 6, 8, 9 • 比較次數: • best case: • time complexity: 2 5 6 8 9 i*2i = O(nlogn), d log2n

Heapsort e.g. input data: 25 57 48 37 12 92 86 33 將input data存入almost complete binary tree Step 1: Construct a heap 25 57 57 57 37 48 25 25 48 25 57 92 92 37 48 37 57 37 86 25 12 25 12 48 25 12 48 57

0 92 2 1 37 86 3 4 5 6 33 12 48 57 7 25 Step 2: Adjust the heap. 86 57 37 57 37 48 33 12 48 25 33 12 25 86 92 92 (b)x[6] =第二大 (a)x[7] =最大值

48 37 37 25 33 25 33 12 57 86 12 48 57 86 92 92 (d)x[4] =第四大 (c)x[5] =第三大 33 25 12 25 12 33 37 48 57 86 37 48 57 86 92 92 (f)x[2] =第六大 (e)x[3] =第五大

Final: x[0] 12 x[1] x[2] 25 33 x[3] x[6] 37 48 57 86 x[7] x[4] x[5] 92 (g) x[1] =第七大 • The heapsort should be implemented by an • array, not by a binary tree • time complexity: O(nlogn) (in the worst case)

Insertion sort e.g. (由大而小 sort) 8 2 9 5 6 pass 1 8 2 9 5 6 pass 2 8 9 2 5 6 9 8 2 5 6 pass 3 9 8 5 2 6 9 8 5 2 6 pass 4 9 8 5 6 2 9 8 6 5 2 9 8 6 5 2

Insertion sort • 方法: 每次處理一個新的資料時, 一定insert至適當的位置才停止. • 需要 n-1 個 pass • best case:未 sort前, 已按順序排好, 每個 pass • 僅需一次比較, 共需 (n-1)次比較. • worst case:未 sort前, 按相反順序排好, 比較次數為: • Time complexity: O(n2)

void insertsort(int x[], int n) { int i, k, y; /* nonincreasing order */ //initially x[0] may be thought of as a sorted //file of one element. After each repetition of //the following loop, the elements x[0] through //x[k] are in order for (k = 1; k < n; k++){ /* Insert x[k] into the sorted file */ y = x[k]; /* Move down 1 position all elements greater*/ /* than y */ for (i = k-1; i >= 0 && y > x[i]; i--) x[i+1] = x[i]; /* Insert y at proper position */ x[i+1] = y; } /* end for */ } /* end insertsort */

Shell sort (diminishing increment sort) • 方法: insertion sort是相鄰兩個資料做比較, 再決定是否互換.Shell sort則是相距為 d的兩個"比較與互換"(compare and exchange). d為任意大於 1的整數,但在最後一個 pass, d必須為 1.

e.g.由大到小 sort 21 11 09 02 16 31 26 01 27 05 13 19 12+1 pass 1: d1 = = 6 2 d1 26 11 27 05 16 31 21 01 09 02 13 19 d1+1 6+1 pass 2: d2 = = = 3 2 2 d2 26 16 31 21 13 27 05 11 19 02 01 09 d2+1 3+1 pass 3: d3 = = = 2 2 2 d3 31 27 26 21 19 16 13 11 05 09 01 02 d3+1 2+1 pass 4: d4 = = = 1 2 2 d4 31 27 26 21 19 16 13 11 09 05 02 01

每個 pass均進行多組的insertion sort.若一開始 d=1,則與 insertion sort完全一樣 Knuth 證明: di-1-1 di-1 = 3di+1, 即 di= 為最好 3 time complexity: O(nlog2n)~O(n3/2) • 適合數百個資料之sorting

void shellsort (int x[], int n, int incrmnts[], int numinc) { /* nonincreasing order */ int incr, j, k, span, y; for (incr = 0;incr < numic; incr++){ /* span is the size of the increment */ span = incrmnts[incr]; for (j = span; j< n;j++){ /* Insert element x[j] into its proper */ /* position within its subfile */ y = x[j]; for (k = j-span; k >= 0 && y > x[k]; k -= span) x[k+span] = x[k]; x[k+span] = y; } /* end for */ } /* end for */ } /* end shellsort */

address calculation sort (sorting by hashing) e.g. 由小到大sort input data: 19 13 05 27 01 26 31 16 02 09 11 21 分成 10個 subfile,每個 subfile是一個 linked list,其資料由小而大排列 01 02 05 09 11 13 16 19 21 26 27 31 ... 假設有 n個資料, m個 subfile n ～ best case: 1, 且 uniform distribution time complexity: O(n) ～ m n worst case: >>1, 或 not uniform distribution time complexity: O(n2) m

[25 37 48 57][12 33 86 92] [12 25 33 37 48 57 86 92] merge Two-way merge • Merge two sorted sequences into a single one. • e.g. 設兩個 sorted lists 長度各為 m, n time complexity: O(m+n)

Merge sort • e.g. (由小而大) [25] [57] [48] [37] [12] [92] [86] [33] pass 1 [25 57] [37 48] [12 92] [33 86] pass 2 [25 37 48 57] [12 33 86 92] pass 3 [12 25 33 37 48 57 86 92] • 需要 log2n個 pass • time complexity: O(nlogn) • It can be implemented by a recursive function.

Radix sort e.g.由小到大的 sort 01 31 11 21 02 13 05 26 16 27 19 09 01 02 05 09 11 13 16 19 21 26 27 31 19 13 05 27 01 26 31 16 02 09 11 21 0) 1) 01,31,11,21 2) 02 3) 13 4) 5) 05 6) 26,16 7) 27 8) 9) 19,09 0) 01,02,05,09 1) 11,13,16,19 2) 21,26,27 3) 31 4) 5) 6) 7) 8) 9) input data pass 1 merge pass 2 merge

方法: • (1)每個資料不與其它資料做比較, 只看自己放在何處 • (2)pass 1: 從個位數開始處理, 若是個位數為 1, 則放在 bucket 1, 以此類推 • (3)pass 2: 處理十位數 • 好處: 速度快, time complexity: O(nlogpk) • k: input data 之最大數 • p: 以 p 為基底 • logpk: 位數之長度 • 缺點: 需要額外的 memory(可使用 linked list, • 將所需 memory 減至最少, 但會增加時間).

Chapter 6

Chapter 6

Presentation Transcript

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

CHAPTER 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

CHAPTER 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6