1 / 54

Data Structures and Algorithms Analysis of Algorithms

Data Structures and Algorithms Analysis of Algorithms. Richard Newman. Players. Boss/Manager/Customer Wants a cheap solution Cheap = efficient Programmer/developer Wants to solve the problem, deliver system Theoretician Wants to understand Student

garmon
Download Presentation

Data Structures and Algorithms Analysis of Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Structures and Algorithms Analysis of Algorithms Richard Newman

  2. Players • Boss/Manager/Customer • Wants a cheap solution • Cheap = efficient • Programmer/developer • Wants to solve the problem, deliver system • Theoretician • Wants to understand • Student • Might play any or all of these roles some day

  3. Why Analyze Algorithms? • Predict performance • Compare algorithms • Provide guarantees • Understand theory • Practical reason: avoid poor performance! • Also – avoid logical/design errors

  4. Algorithmic Success Stories • DFT • Discrete Fourier Transform • Take N samples of waveform • Decompose into periodic components • Used in DVD, JPEG, MPEG, MRI, astrophysics, .... • Brute force: N2 steps • FFT algorithm: N lg N steps

  5. Algorithmic Success Stories • B-Body Simulation • Simulate gravitational interactions among N bodies • Brute force: N2 steps • Barnes-Hut algorithm: N lg N steps

  6. The Challenge • Will my algorithm be able to solve problem with large practical input? • Time • Memory • Power • Knuth (1970's) – use scientific method to understand performance

  7. Scientific Method • Observe feature of natural world • Hypothesize a model consistent with observations • Predict events using hypothesis • Test predictions experimentally • Iterate until hypothesis and observations agree

  8. Scientific Method Principles • Experiments must be reproducible • Hypotheses must be falsifiable

  9. Example: 3-Sum • Given N distinct integers, how many triples sum up to exactly zero % cat 8ints.txt 8 30 -40 -20 -10 40 0 10 5 % ./ThreeSum 8ints.txt 4

  10. 3-Sum Brute Force Algo For i=0 to N-1 For j=i+1 to N-1 For k=j+1 to N-1 If a[i] + a[j] + a[k] == 0 count++ return count

  11. Measuring Running Time Manually • Start stopwatch when starting program Stop it when program finishes • Can do this in script (date) Internally • Use C library function time() • Can insert calls around code of interest • Avoid initialization, etc.

  12. Measuring Running Time Strategy • Run program on various input sizes • Measure time for each • Can do this in script also • Plot results tools: http://www.opensourcetesting.org/performance.php

  13. Measuring Running Time What do you think the time will be for input of size 16,000? Why?

  14. Data Analysis Standard Plot • Plot running time T(N) vs. input size N • Use linear scales for both

  15. Data Analysis Log-log Plot If straight line Slope gives power lg y = m lg x + b y = 2b xm

  16. Hypothesis, Prediction, Validation Hypothesis: running time 10-10 N3 Prediction: T(16,000) = 409.6 s Observation: T(16,000) = 410.8

  17. Doubling Hypothesis Quick way to estimate slope m in log-log plot Strategy: Double size of input each run • Run program on doubled input sizes • Measure time for each • Take ratio of times • If polynomial, should converge to power

  18. Doubling Hypothesis Hypothesis: running time 10-10 N3 Prediction: T(16,000) = 409.6 s Observation: T(16,000) = 410.8

  19. Doubling Hypothesis • Hypothesis: running time is about aNb • With b = lg(ratio of running times) Caveat!!! • Cannot identify logarithmic factors • How to find a? • Take large input, equate time to hypothesized time with b as estimated, then solve for a

  20. Experimental Algorithmics System Independent Effects • Algorithm • Input data

  21. Experimental Algorithmics System Dependent Effects • Hardware: CPU, memory, cache, ... • Software: compiler, interpreter, garbage collection, ... • System: OS, network, other processes

  22. Experimental Algorithmics Bad news • Hard to get precise measurements Good news • Easier than other physical sciences! • Can run huge number of experiments

  23. Mathematical Running Time Models Total running time = sum (cost x freq) • Need to analyze program to determine set of operations over which weighted sum is computed • Cost depends on machine, compiler • Frequency depends on algorithm, input data Donald Knuth 1974 Turing Award

  24. How to Estimate Constants? *Running OS X on Macbook Pro 2.2 GHz 2 GB RAM

  25. Experimental Algorithmics Observation: most primitive functions take constant time • Warning: non-primitive often do not! How many instructions as f(input size)? int count = 0; for (int i = 1; i < N; ++i) if (a[i] == 0) count++;

  26. Experimental Algorithmics int count = 0; for (int i = 1; i < N; ++i) if (a[i] == 0) count++;

  27. Counting Frequency - Loops int count = 0; for (int i = 1; i < N; ++i) for (int j = i+1; j < N, ++j) if (a[i] + a[j] == 0) count++; How many additions in loop? N-1 + N-2 + ... + 3 + 2 + 1 = (1/2) N (N-1) Exact number of other operations? Tedious and difficult....

  28. Experimental Algorithmics Observation: tedious at best Still may have noise! Approach: Simplify! • Use some basic operation as proxy e.g., array accesses int count = 0; for (int i = 1; i < N; ++i) for (int j = i+1; j < N; ++j) if (a[i] + a[j] == 0) count++;

  29. Experimental Algorithmics Observation: lower order terms become less important as input size increases Still may be important for “small” inputs Approach: Simplify! Use ~ • Ignore lower order terms • N large, they are negligible • N small, who cares?

  30. Leading Term Approximation Examples Ex 1: 1/6 N3 + 20 N + 16 ~ 1/6 N3 Ex 2: 1/6 N3 + 100 N4/3 + 56 ~ 1/6 N3 Ex 3: 1/6 N3 – 1/2 N2 + 1/3 N ~ 1/6 N3 Discard lower order terms e.g., N=1000, 166.67 million vs. 166.17 million

  31. Leading Term Approximation Technical definition: f(N) ~ g(N) means limit = 1 f(N) N -> inf g(N)

  32. Bottom Line int count = 0; for (int i = 1; i < N; ++i) for (int j = i+1; j < N, ++j) if (a[i] + a[j] == 0) count++; How many array accesses in loop? ~ N2 Use cost model and ~ notation!

  33. Example - 3-Sum int count = 0; for (int i = 1; i < N; ++i) for (int j = i+1; j < N; ++j) for (int k = j+1; k < N; ++k) if (a[i] + a[j] + a[k] == 0) count++; How many array accesses in loop? Execute N (N-1)(N-2)/3! Times ~ (1/6)N3 ~ (1/2) N3 array accesses (3 per stmt) Use cost model and ~ notation!

  34. Estimating Discrete Sums Take Discrete Math (remember?) Telescope series, inductive proof Approximate with integral Doesn't always work! Use Maple or Wolfram Alpha

  35. Takeaway In principle, accurate mathematical models In practice Formulas can be complicated Advanced math might be needed Are subject to noise anyway Exact models – leave to experts! We will use approximate models

  36. Order-of-Growth Classes

  37. Order-of-Growth • Definition: If f(N) ~ c g(N) for some constant c > 0, then f(N) is O(g(N)) • Ignores leading coefficient • Ignores lower order terms • Brassard notation: O(g(N)) is the set of all functions with the same order • So 3-Sum algorithm is order N3 • Leading coefficient depends on hardware, compiler, etc.

  38. Order-of-Growth • Good News! • The following set of functions suffices to describe order of growth of most algorithms: • 1, log N, N, N log N, N2, N3, 2N, N!

  39. Order-of-Growth

  40. Binary Search • Goal: Given a sorted array and a key, find the index of the key in the array • Binary Search: Compare key against middle entry (of what is left) • Too small, go left • Too big, go right • Equal, found

  41. Binary Search Implementation • Trivial to implement? • First binary search published in 1946 • First bug-free version in 1962 • Bug in Java's Arrays.binarySearch() discovered in 2006! http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

  42. Binary Search – Math Analysis • Proposition: BS uses at most 1+lg N key compares for a sorted array of size N • Defn: T(N) = # key compares on sorted array of size <= N • Recurrence: for N > 1, T(N) <= T(N/2) + 1 for N = 1, T(1) = 1

  43. Binary Search – Math Analysis • Recurrence: for N > 1, T(N) <= T(N/2) + 1 for N = 1, T(1) = 1 • Pf Sketch: (Assume N a power of 2) T(N) <= T(N/2) + 1 • <= T(N/4) + 1 + 1 • <= T(N/8) + 1 + 1 + 1 ... • <= T(N/N) + 1 + 1 + 1 + ... + 1 • = 1 + lg N

  44. 3-Sum • Version 0: N3 time, N space • Version 1: N2 log N time, N space • Version 2: N2 time, N space

  45. 3-Sum – N2 log N Algorithm • Algorithm • Sort the N (distinct) integers • For each pair of numbers a[i] and a[j], • Binary Search for -(a[i] + a[j]) • Analysis: Order of growth is N2 log N • Step 1: N2 using insertion sort • Step 2: N2 log N with binary search • Can achieve N2 by modifying BS step

  46. Comparing Programs • Hypothesis: Version 1 is significantly faster in practice than Version 0 Version 0 Version 1 Theory works well in practice!

  47. Memory • Bit: 0 or 1 (binary digit) • Byte: 8 bits (wasn't always that way) • Megabyte (MB): 1 million or 220 bytes • Gigabyte (GB): 1 billion or 230 bytes NIST and networks guys Everybody else

  48. Memory • 64-bit machine: assume 8-byte pointers • Can address more memory • Pointers use more space • Some JVMs “compress” ordinary object pointers to 4 bytes to avoid this cost

  49. Typical Memory Usage 1-D arrays Primitive types 2-D arrays

  50. Typical Java Memory Usage Object Overhead: 16 bytes Object Reference: 8 bytes Padding: Objects use multiple of 8 bytes Ex: Date object public class Date { private int day; private int month; private int year; ... } Object Overhead 16 bytes (OH) day 4 bytes (int) 4 bytes (int) month 4 bytes (int) year 4 bytes (pad) padding 32 bytes total

More Related