The Design & Analysis of the Algorithms

The Design & Analysis of the Algorithms Lecture 1. 2010.. by me M. Sakalli

Your attendance is Mandatory, 70%. • Your Evaluation will be algorithmic. • Sources: • Int2algorithms by Cormen, Leiserson, Rivest, and Stein. • Algorithm Design by Jon Kleinberg and Eva Tardos, Pearson Education, Inc. (2006). (Sample chapters) • Int2Analysi&DesignOfAlg by A. Levitin. compact with good enough examples, by Levitin. • Freedom of information therefore, Internet savvy, final and ultimate resource is internet, courses around the world, and MIT in particular and wikipedia. • Kozen.

The emphasis of my triangle will be on the game algorithms, graph (NW algorithms), np-complexity. On Proceeding Weeks: Part I • Skip: More on asymptotic analysis, recurrences, substitution, master and (generating functions) annihilation methods. • Skip: Lect.9 MSTBST and Randomized QS.. • Round-robin scheduling and round-robin tournament, Combinatorial games, impartial games, Min-max, alpha beta pruning, • MIT: Red-Black trees, Amortized algorithms, Competitive analysis self organizing lists, greedy mst (repeat Kruskal-Prim). • DP, optimality. Event Scheduling, LC subsequence, evaluation of some multiplication algorithms, Matrix multiplication.

On Proceeding Weeks: Part II • Graph algorithms, review SP algos, and then all pairs shortest paths, Bellman Ford, LP, Difference constraints. • NW flow mincut-maxflow, bipartite matching, stable marriage. • (Minimum) VC is the minimum set of vertices of G for which at least one end of any edge in G ends in a vertex of S. • Maximal Independent set problem is complementary of VC, and is subset of vertices in G, such that no two adjacent vertices are in the same set. • Maximum Clique. • Determinism (FSM) vs non-determinism, P, NP, NPC, Reductions NP-completeness, SAT, 3SAT ..

An algorithm: A sequence of unambiguous – well defined procedures, instructions for solving a problem. For an input size. Execution must be completed at a finite amount of time. Analysis means: • evaluate the costs, time and space costs, and manage the resources and the methods.. • A generic RAM model of computation in which instructions are executed consecutively, but not concurrently or in parallel. • Computational Model in terms of an abstract computer: A Turing machine. Abstraction. Problem, relating input parameters with certain state parameters.. algorithm FSM, RAM, PRAM, uniform circuits input output

Time efficiency: Estimation in the asymptotic sense called Big O, , ., comparing two functions to determine a constant c, and ni. Machine independence: Bandwidth, and the amount of hardware involved, # of the gates. • Space efficiency, Memory. • Theoretically: • Prove its correctness.. • Efficiency: Theoretical and Empirical analysis • Its optimality • The methods applied, iterative, recursive and parallel. • Desired scalability: Various range of inputs and the size and dimension of the problem under consideration.

Historical Perspective … • Muhammad ibn Musa al-Khwarizmi – 9th century mathematician http://www.ms.uky.edu/~carl/ma330/project2/al-khwa21.html • http://en.wikipedia.org/wiki/Analysis_of_algorithms. • …

Euclid’s Algorithm Problem definition: gcd(m, n) of two nonnegative, not both zero integers m and n, m > n Examples: gcd(60,24) = 12, gcd(60,0) = 60, gcd(0,0) = ? Euclid’s algorithm is based on repeated application of equality gcd(m, n) = gcd(n, m mod n) until the second number reaches to 0. Example: gcd(60,24) = gcd(24,12) = gcd(12,0) = 12 r0 m, r1 n, ri-1 ri qi + ri+1, 0 < ri+1 < ri , 1i<t, … rt-1 rt qt + 0

Asymptotic order of growth A way of comparing functions that ignores constant factors and small input sizes • O(g(n)): class of functions f(n) that grow no faster than g(n) • Θ(g(n)): class of functions f(n) that grow at the same rate as g(n) • Ω(g(n)): class of functions f(n) that grow at least as fast as g(n)

O, Ω, Θ

Establishing order of growth using the definition Definition: f(n) is in O(g(n)), O(g(n)), if order of growth of f(n) ≤ order of growth of g(n) (within constant multiple),i.e., there exist a positive constant c,c  N, and non-negative integer n0 such that f(n) ≤c g(n) for every n≥n0 f(n) is o(g(n)), if f(n) ≤ (1/c) g(n) for every n≥n0 (anf for every c) which means f grows strictly more slowly than any arbitrarily small constant of g. f(n) is (g(n)), c  N, f(n) ≥ c g(n) for n≥n0 f(n) is (n) if f(n) is both O(n) and (n) Examples: • 10n is O(cn2), c ≥ 10, since 10n≤ 10n2 for n ≥1 • or for smaller c values ie c ≥ 1, 10n≤n2 for n ≥10 • 5n+20 is O(cn), for all n>0, c>=25, since 5n+20 ≤ 5n+20n ≤ 25n, • or c>=10 for n ≥4

Emphasizing: The unit of efficiency analysis is comparison. Chapter.1 Kozen • A generic RAM model of computation in which instructions are executed consecutively, but not concurrently or in parallel. Running time analysis: The # of the primitive operations executed for every line of the code; asymptotic analysis by ignoreing machine-dependent constants, looking at computational growth of T(n) as n→∞, n: input size that determines the number of iterations: Relative speed (in the same machine) Absolute speed (between computers).. Parallel..

Step 1 if (n == 0 or m==n), return m and stop; otherwise go to Step 2 Step 2 Divide m by n and assign the value of the remainder to r Assign the value of n to m and the value of r to n. Go to Step 1. whilen ≠ 0 do {r ← m mod n m← n n ← r} returnm

whilen ≠ 0 do {r ← m mod n m← n n ← r} returnm The upper bound of i iterations: ilog(n)+1 where  is (1+sqrt(5))/2. i = O(log(max(m, n))), since ri+1 ri-1/2. ci The Lower bound, (log(max(m, n))), = therefore (log(max(m, n))).

Step 1 if (n == 0 or m==n), return m and stop; otherwise go to Step 2 Step 2 Divide m by n and assign the value of the remainder to r Assign the value of n to m and the value of r to n. Go to Step 1. whilen ≠ 0 do { r ← m mod n m← n n ← r} returnm The upper bound of i iterations: ilog(n)+1 where  is (1+sqrt(5))/2. i = O(log(max(m, n))), since ri+1 ri-1/2.ci The Lower bound, (log(max(m, n))), = therefore (log(max(m, n))).

Proof of Correctness of the Euclid’s algorithm • Step 1, If n divides m, then gcd(m,n)=n • Step 2, gcd(mn,)=gcd(n, m mod(n)). • If gcd(m, n) divides n, this implies that gcd must be  n: gcd(m, n)n. n divides m and n, implies that n must be smaller than the gcd of the pair {m,n}, ngcd(m, n) • If m=n*b+r, for r, b integer numbers, then gcd(m, n) = gcd(n, r). Every common divisor of m and n, also divides r. • Proof: m=cp, n=cq, c(p-qb)=r, therefore g(m,n), divides r, so that this yields g(m,n)gcd(n,r).

Other methods for computing gcd(m,n) Consecutive integer checking algorithm, not a good way, it checks all the .. Step 1 Assign the value of min{m,n} to t Step 2 Divide m by t. If the remainder is 0, go to Step 3;otherwise, go to Step 4 Step 3 Divide n by t. If the remainder is 0, return t and stop; otherwise, go to Step 4 Step 4 Decrease t by 1 and go to Step 2 Bruteforce!! Exhaustive??.. Very slow even if zero inputs would be checked.. O(min(m, n)). (min(m, n)), when gcd(m,n) =1. (1) for each operation, Overall complexity is (min(m, n))

Other methods for gcd(m,n) [cont.] Middle-school procedure Step 1 Find the prime factorization of m Step 2 Find the prime factorization of n Step 3 Find all the common prime factors Step 4 Compute the product of all the common prime factors and return it as gcd(m,n) Is this an algorithm?

Sieve of Eratosthenes-Method applied Input: Integer n ≥ 2 Output: List of primes less than or equal to n, sift out the numbers that are not. for p ← 2 to n do A[p] ← p for p ← 2 to n do if A[p]  0 //p hasn’t been previously eliminated from the listj ← p*p while j ≤ ndo A[j] ← 0// mark element as eliminated j ← j+ p Ex: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 2 3 5 7 9 11 13 15 17 19 21 23 25 2 3 5 7 11 13 17 19 23 25 2 3 5 7 11 13 17 19 23

Example of computational problem: sorting • Statement of problem: • Input: A sequence of n numbers <a1, a2, …, an> • Problem: Reorder <a´1, a´2, …, a´n> in ascending or descending order. • Output desired: a´i≤a´j , for i < j or i > j • Instance: The sequence <5, 3, 2, 8, 3> • Algorithms: • Selection sort • Insertion sort • Merge sort • (many others)

Selection Sort • Input: array indexed from 0 to n-1: a[0], …, a[n-1] • Output: array A sorted in non-decreasing order; scanning elements of unsorted part from i to n-1.. • Smallest is substituted into the ith initial position.., initial position shifts to n-1.. • Algorithm makes n-1 passes, and after i passes, the first i+1 numbers in the array is sorted. Strategy: In pass i, swap ith with the smallest. • Algorithm: (Insertion in place) for (i = 0; i < n; i++) swap a[i] with the smallest of a[i+1],.., a[n] Swapping. (1+... + n) = (n-1)*n/2=(n2) for all cases sorted or unsorted, therefore it is independent from the input!!.

Insertion Sort • Input: array indexed from 0 to n-1: a[0], …, a[n-1] • Output: array a sorted in non-decreasing order; scanning elements of unsorted part backward, n-1, n-2, ..1 • Algorithm makes n-1 passes, and after i passes, the first i+1 numbers in the array get sorted. • Strategy: In pass i, move ith item left to its proper place. • Algorithm: (Insertion in place) for (i = 1; i < n; i++) swap a[i] if smaller of a[i-1],.., a[0] • Write swapping part of this algorithm: Swapping. (1+... + n) = (n-1)*n/2=(n2). Dependence on the input stats.

Example: original 34 8 64 51 32 21 ^ after i=1 8 34 64 51 32 21 ^ after i=2 8 34 64 51 32 21 ^ after i=3 8 34 51 64 32 21 ^ after i=4 8 32 34 51 64 21 ^ after i=5 8 21 32 34 51 64 In terms of the number of inversions: if sorted no inversions. If reverse sorted then (n-1)*n/2 inversions. If randomly sorted, (n-1)*n/4 inversions. Its performance  n2-n(logn)

Shell Sort • Shell Sort, o(n2), runs very fast in place, invented by Don Shell, the first algorithm <= n2 barrier. • It's like insertion sort, but swaps non-adjacent elements. • Strategy. Pick a increment sequence: h1 < h2 < h3 ... < ht , where h1 = 1. After phase k, a[i] <= a[i + hk], for all i. (Notice in insertion sort, hi = 1, for all i. index 0 1 2 3 4 5 6 7 8 9 10 11 12 aft 5 81 94 11 96 12 35 17 95 28 58 41 75 15 aft 3 35 17 11 28 12 41 75 15 96 58 81 94 95 aft 1 becomes all sorted, since behaves like insertion sort.. • Shellsort used in practice simple to code, and its performance is often not better than some O(n log n) algorithms.

Merge Sort. A classic example of divide-and-conquer. Divide A into left and right halves and recursively sort each. When the array gets a smaller in size (2 or 4, use insertion sort). Sort while merging of sorted Lhalf, sand orter Rhalf. That is, A  LA and RA  Sorted (LA) and Sorted (RA)  Merging!! Comparing against key Assume that a new array is used for the output C. Put ptrs at first elements of L and R arrays. Copy the smaller element to the output array C, and advance the ptr of the smaller element. Repeat until one ptr reaches the end, then copy the second array over. Merge Example: 1 2 13 24 26 + 3 15 27 38 = 1 2 3 13 15 24 26 27 38 Time to merge two lists is LINEAR in their total size, always advance one ptr.

The runtime analysis through recursion: substation, master, annihilation.. T(n) = 1 if n = 1 Base state T(n) = 2 T (n/2) + n otherwise. Solve the recurrence by substitution, (unraveling it): T(n) = 2 T (n/2) + n = 2 (2 T(n/4) + n/2) + n = 22 T(n/22) + 2n .... = 2i T(n/2i) + in n/2i = 1, i = lg(n), so T(n) = n + nlgn = nlgn + n. Theoretically optimal, rarely used in practice; extra memory needed in merging, and algorithms easier to code and with same or better performance exist (e.g. quicksort).

Merge Sort. A classic example of divide-and-conquer. Divide A into left and right halves and recursively sort each. When the array gets a smaller in size (2 or 4, use insertion sort). Sort while merging of sorted Lhalf, sand orter Rhalf. That is, A  LA and RA  Sorted (LA) and Sorted (RA)  Merging!! Comparing against key Assume that a new array is used for the output C. Put ptrs at first elements of L and R arrays. Copy the smaller element to the output array C, and advance the ptr of the smaller element. Repeat until one ptr reaches the end, then copy the second array over. Merge Example: 1 2 13 24 26 + 3 15 27 38 = 1 2 3 13 15 24 26 27 38 Time to merge two lists is LINEAR in their total size, always advance one ptr. The runtime analysis through recursion: T(n) = 1 if n = 1 Base state T(n) = 2 T (n/2) + n otherwise. Solve the recurrence by substitution, (unraveling it): T(n) = 2 T (n/2) + n = 2 (2 T(n/4) + n/2) + n = 22 T(n/22) + 2n .... = 2i T(n/2i) + in n/2i = 1, i = lg(n), so T(n) = n + nlgn = nlgn + n. Theoretically optimal, rarely used in practice; extra memory needed in merging, and algorithms easier to code and with same or better performance exist (e.g. quicksort).

Quicksort. One of the fastest in practice. Expected E[T(n)] is O(n log n), the worst-case is O(n2); It can be exponentially unlikely with little effort. General Strategy: if |A| == 0 or 1, return p A(any(point)); //Pivot any element in A. AL ((A\p) <=p); AR ((A\p)>= p). //Partition into two disjoint groups return (QS(AL) + p + QS(AR)) Example: A = {13, 81, 65, 92, 43, 31, 57, 26, 75, 0} Pivot p = 65 AL  {13, 0, 43, 31, 26, 57} and AR  {81, 92, 75} …. Subarray sizes: ~equal sized in MS, determined by the pivot in QS. Strategies for choosing pivots, and the in place partition..

Pivoting Choices: First or last array element. Doesn’t matter if A is random, but if array is (partially) sorted, not recommended. Random pivoting. Generally works very well. recommended. Median of 3: Pick 3 random elements and choose their median, or median of left, right, and middle. E.g. in {…, 8, 1, 4, 9, 0, 3, 5, 2, 6}, left=8, right=6, mid=9, so the median is 6.?? In place Partitioning. swap the pivot with the last array item. i=0, and j n-2. while i < j do i  i++; (as long as element is <= pivot) j  j - -; (as long as element is >= pivot) swap these elements //i is pointing at element > pivot and //j is pointing at element < pivot

Example: after pivot and last element swap, 8 1 4 9 0 3 5 2 7 6 i j advance i,j i j 2 1 4 9 0 3 5 8 7 6 swapped i j advance i,j 2 1 4 5 0 3 9 8 7 6 swap and advance j i now j < i; stop Now swap pivot element with the element at position i; 2 1 4 5 0 3 6 8 7 9 done. Here we assumed that all elements were distinct. What if there • are elements equal to the pivot? One suggested solution it to • stop both i and j when pivot element encounted, swap and continue.

Brute force, heuristics.. Divide and conquer Decrease and conquer Transform and conquer Greedy approach Dynamic programming Iterative improvement Backtracking Branch and bound Randomized algorithms The methods

The Design & Analysis of the Algorithms