1 / 33

Counting the bits Analysis of Algorithms

This text explores the concerns, correctness, and efficiency of software algorithms, particularly focusing on sorting algorithms. It discusses experimental and theoretical analysis, complexity bounds, and the use of O-notation. The text also covers proving O-theorems and introduces other notations such as big-theta and little-o. Examples such as the Traveling Salesman Problem and Boolean Satisfiability are used to illustrate the concepts.

lerman
Download Presentation

Counting the bits Analysis of Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Counting the bitsAnalysis of Algorithms Will it run on a larger problem? When will it fail?

  2. Software Concerns • Correctness • test • understand • prove • Maintainability • modify • enhance • Decomposability • cohesion and coupling • multi-programmer, multi-year • Efficiency • memory • cycles

  3. Sample Questions • Task: sorting n elements • How fast can you do it? • in worst case • on average • How much more memory is needed? • Data assumptions? • in memory or on disk? • nearly sorted? • duplicates retained? • .... • Different algorithms for different conditions

  4. How can we tell how well an algorithm works? 1. Experimentally, I.E. Run algorithm on data, report cpu time • depends on coding • depends on machine • depends on compiler • little generality + can do it + expensive • run multiple times, show average performance • guess relevant variables and vary them • What’s relevant for sorting?

  5. 2. Theoretical Analysis • Give upper and lower bounds for complexity + demonstrates an understanding of algorithm + shows relevant variables + shows importance of variables + independent of machine, language, coding - analysis is sometimes difficult - usually only works for simple algorithms - often doesn’t include important constants • Same questions for time can be done for memory usage

  6. Language of Analysis • big O notation • A function f(n) is O(g(n)) iff there are positive constants c and N such than f(n) < c*g(n) when n>N. • Intuitively this says that f is eventually less than a constant times g. • Examples • 17n2 +1000n+15 is O(n2). generalize • 10n+nlog10n is O(nlg(n)) where lg = log2 • It turns out that sorting is O(n log(n)). • What about finding the maximum in an array? • What about finding the k-th largest in an array?

  7. Squeezing the complexity: lower bounds • When upper bounds equal lower bounds, then you know you have the best possible algorithm. • A function f(n) is big-omega(g(n)) iff there are positive constants c and N such that f(n)> c*g(n) for n > N. • Intuitively, this says f is eventually larger than g. • example: n3 -1000n2 is big-omega of n3. • Theoreticians work on closing the gap between the upper and lower bounds. • It turns out that sorting is big-omega(n*log(n)), hence sorting is solved. • caveat: this assumes no special properties of the data. • sometimes constants count • sometimes eventually is not good enough

  8. Useful Theorems • Let f and g be function of n • If f <= g, then O(f+g) = O(g). • O(constant*f) = O(f). • O(f*f) = [O(f)]^2. • O(Polynomial(n) ) = O(nmax degree) • O(f/g) .. Can be anything. • Style: write the simplest and smallest O-notation. • eg. do not write O(n2 + 3n), but O(n^2). • Do not write O(3*n^3), but O(n^3).

  9. Proving O theorems • To prove: if f1 and f2 are O(g(n)) then f1*f2 is O(g(n)*g(n)). • Proof: by your nose. • Ask: what do I need to do? • By definition find a constant c and integer N such that f1(n)*f2(n) < c* g(n)*g(n) for n>N • Ask: what do I know? • There are constants c1,c2, N1,N2 such that • f1(n) < c1*g(n) for n >N1 • f2(n) <c2*g(n) for n >N2 • Aren’t c and N obvious now? • Can you do this?

  10. Another Proof • To prove: if f1 and f2 are O(g(n)) then f1+f2 is O(g(n)). • By your nose • Since f1 is O(g(n)), f1(n) <C1*g(n) for n> N1. • Since f2 is O(g(n)), f2(n) < C2*g(n) for n> N2. • Then f1(n)+f2(n) < (C1+C2) for n> max(N1,N2). • Hint: go back to these proofs in a few days and see if you can prove them without looking.

  11. More notation • A function f(n) is big-theta of g(n) iff it is both big-O and big-omega of g(n). • Intuitively, they have same growth rate. • A function f(n) is little-o of g(n) iff it is big-O of g(n), but not big-omega of g(n). • TSP is big-O(2n). Why? • No one knows if it is big-omega(2n). • Intuitively this means there is some doubt. • These results require proofs. One can also get estimates by running careful experiments. • But experiments can’t prove anything.

  12. Traveling Salesman Problem • Given: n cities and distances between them • Find: tour of minimum length. • A tour is path that visits every city • How many tours are possible? • n*(n-1)...*1 = n! • n! > 2*2……*1 = 2^(n-1) • So n! is omega of 2^n. ( lower bound) • No one knows if there is a polynomial time algorithm to find minimum length tour. • There are many very good heuristic algorithms.

  13. Boolean Satisfiability • Suppose we have a boolean formula on n-variables • e.g. (x1 or not x2 or x3)&(x2 or not x5 or x7)&(...) • This is a conjunct of disjuncts. If restricted to at most 3 literals per disjunct then 3-sat. • Theorem: 3-Sat and general Boolean satisfiability are equivalently difficult. • Does this have a solution? • i.e. assignment of true or false to the variables so that the entire formula is true. • When the formula has at most 3 literals (variable or its negation) in each clause, the problem is called 3-sat. • If we just checked every possibility, how many cases? • 2n , so exhaustive search is O(2n)

  14. Satisfiability, the value • No one knows if this can be done in polynomial time. • Nobel prize level problem, but none given in CS. • One Million $ prize • Application: Given 2 circuits, are they equivalent? • Suppose they each have n inputs. • Let the two circuits, or formulas, be f1 and f2. • Consider: XOR(f1, f2). • This is a boolean formula on n-variables. • If it has a solution S, then f1(S) != f2(S) and the two circuits are different.

  15. A problem we can do:Linear Lookup/Search • Task: determine if value is in array (n items) • Algorithm For i = 1 to n if (a[i] = element) return true else return false • In worst case, time O(n). • In best case, O(1). • What about space? O(1)

  16. Space Analysis • If program has a bad time complexity, you can wait. • If program has a bad space complexity, the program bombs. • SPACE is critical. • Space utilization occurs in two places: • 1. Every time you call new. • Analysis is like time complexity • 2. Hidden usage: function calls • Depth of function calls is critical here. • Not a problem except when using recursion.

  17. Find most similar student records • Assume we have n records Record[I] • Each record has k grades • Algorithm Sketch • for j = 1…n • for h = j+1…n • compute similarity of records j and h • if similarity better than best, save j,h and score • end h • end j • return j and h. • Time complexity: O(n^2) • Space complexity: O(1).

  18. Divide and Conquer • General Problem Solving method • Top-down division of labor • General Algorithm: Solution = Solve(Problem) Solve(Problem) = if Problem easy, return basicSolve(Problem) else Pieces = split(Problem) PartSolutions =Solve(Pieces) return merge(PartSolutions) • Examples: mergesort, quicksort, binary search

  19. General analysis of Divide and Conquer • Suppose splitting yields a subproblems • Suppose size of subproblems is N/b • Suppose merging cost is O(n^k) • Then T(N) = if a>b^k O(N^(log b a)) if a = b^k O(N^k * log(N) ) if a < b^k O(N^k) • For example, in mergesort where N = b^m. A=2, b=2, k = 1 Therefore: O(N*log(N)) • You should know how to use this theorem (you don’t need to memorize result).

  20. Binary Search • Binary search (assumes ordered) • BinarySearch(a, l, u) • mid = (l+u)/2 • if ( l==u ) return a[mid] == element; • if (element = a[mid]) return mid; • else if (element <a[mid]) return BinarySearch(a,l,mid); • else if (element >a[mid]) return BinarySearch(a,mid,u); • Time Analysis • f(n) = c+f(n/2) • f(1) = c. • Hence: f(n) = c*lg(n). • Expand out to see this, with n a power of 2.

  21. Binary Search • Space Analysis • Only new memory counts • New memory occurs when • constructors are called • recursion occurs • Back to code: • What is max depth of stack? • O(log n)

  22. Maximum contiguous subsequence problem • Given a[1]…a[n] are integers, find i, j such that sum of a[k] from i to j is maximum. • Algorithmic Approaches • Exhaustive, dumb but a start • optimize via problem constraints • Divide and Conquer • requires breaking problem in pieces whose solutions can be merged • Dynamic Programming • requires merging subsolutions of smaller problems into full solution • Be Clever: this is for the theoreticians • Transform a known problem solution

  23. Exhaustive Best = 0, begin = 0, end= 0 For i = 1…n for j = i+1 … n temp = 0 for k = i …j temp += a[k] if (temp > best) best = temp update begin, end Time complexity: O(N^3) Space complexity: O(1) see text for Java code

  24. Partition trick For i = 1…n/3 > n/3 times for j = 2n/3 … n > n/3 times temp = 0 for k = i …j > n/3 temp += a[k] if (temp > best) best = temp update begin, end Time complexity: O(N^3) N^3/9 More painful analysis in the text.

  25. Optimize (exhaustive) Best = 0, begin = 0, end= 0 For i = 1…n temp = 0 for j = i+1 … n temp += a[k] if (temp > best) best = temp update begin, end Time: O(n^2) Space: O(1)

  26. Dynamic Programming • General Idea: use results of smaller problem • Decomposition: finding relationship between problem and smaller ones • Idea Compute S[i][j] = sum a[i]+… a[j] Keep track of best Note: S[i][i] = a[i], boundary condition S[i][k] = S[i][k-1] + a[k] See Matrix Time: O(n^2) Space: O(n^2)

  27. Fibonacci via DP • Definition of f(k) • if (k <2) return 1 else return f(k-1)+f(k-2 • Expand: f(4) = f(3) + f(2) = f(2)+f(1) + f(1)+f(0) = f(1)+f(0)+ f(1)+f(1)+f(0) • Redundant computation: exponential! (see text) • Dynnamic programming: bottom-up instead of top-down. • StraightForward: use array: f(k) depends on two previous values. • Optimise: don’t really need array, just last 2 values. • T(N) = T(N-1)+T(N-2) +2

  28. Dynamic Programming (example 2) • Define Comb(k, n) = n! / (k!*(n-k)!) • Why doesn’t this compute well? • Goal: find definition in terms of smaller k,n • Note: Comb(k,n) = Comb(k-1, n-1)+ Comb(k,n-1) • Ok, so dynamic programming will work where S[k][n] is determine by element above and to left • time analysis: O(n^2) • space analysis: O(n^2) • Why is this useful?

  29. Divide and Conquer • Previous examples • MergeSort, QuickSort, BinarySearch • Divide array in half, say left and right • Suppose solutions to left and right • How to combine? • From each half need two solutions • interior solution (away from boundary) • boundary solution • Solution is • best of: • interior left, interior right, sum of (boundary solutions) • See text for code.

  30. Clever • Look at properties of answer • Note: if a[i]+…+a[j] is a solution then a[i]+…+a[p] must be positive. Best = 0, temp = 0 For j = 1…n temp += a[j] if (temp > best) best = temp else if (temp <0) temp = 0 Time: O(n)

  31. Transform Problem • Problem: what is minimum subsequence sum? • Replace each a[i] by -a[i]. Done. • Problem. Assume each ai is positive. What is maximum product? • Replace each a[i] by log(a[i]). • Note xy > uv iff log(xy) >log(uv) • But log(xy) = log(x)+ log(y) • Transforms not always easy to find.

  32. Concrete classes in Collections

  33. Summary • Memory and time complexity measures • Critical for some applications • Sometimes the constants count • Be aware of your computational expenses • Document non-O(1) memory or time costs • Consider problem decomposition, either dynamic programming or divide and conquer

More Related