1 / 80

Advanced UNIX

Advanced UNIX. 240-491 Special Topics in Comp. Eng. 2 Semester 2, 2000-2001. Objectives introduce profiling based on execution times and line counts. 15. Profiling. Contents. 1. What is Profiling? 2. Profiling Primes Programs 3. Primes v.1 (p1.c) 4. Primes v.2 (p2.c)

louise
Download Presentation

Advanced UNIX

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced UNIX 240-491 Special Topics in Comp. Eng. 2Semester 2, 2000-2001 • Objectives • introduce profiling based on execution times and line counts 15. Profiling

  2. Contents 1. What is Profiling? 2. Profiling Primes Programs 3. Primes v.1 (p1.c) 4. Primes v.2 (p2.c) 5. Primes v.3 (p3.c) 6. Primes v.4 (p4.c) continued

  3. 7. Primes v.5 (p5.c) 8. Care with Timings 9. Quick Timings 10. Function Call Trees

  4. 1. What is Profiling? • Profiling a program involves collecting numerical data about its execution • e.g. the total running time, the running time for each function, the number of times a function/statement is executed • This information can be used for speed optimisation, and for debugging.

  5. 2. Profiling Primes Programs • We will profile five versions of a primes program • it prints all the prime numbers between 2 and 70,000 • Two types of profiling are carried out: • time profiling of the functions • counting the number of times statements are executed

  6. 2.1. Time Profiling • Detailed timing information for a program (e.g. foo.c) is obtained in three steps: • $ gcc -pg -o foo foo.c • $ foo • $ gprof -b foo -b switches off the explanation of the results format continued

  7. gprof reports: • execution times for each function • info. on how functions call each other • this includes the number of times a function has been called

  8. gprof Information • man gprof • The Web page: "GNU gprof" • http://www.cs.tu-bs.de/softech/ info/gprof.html • explains some of the undocumented features of gprof, and gives examples of its use

  9. 2.2. Line Counting Profiling • Count the number of times lines in the program have been executed: • $ gcc -fprofile-arcs -ftest-coverage -o foo foo.c • $ foo • $ gcov foo continued

  10. gcov generates a modified source listing of foo.c stored in foo.c.gcov • the listing includes execution counts for the lines • Multiple calls to gcov foo, will cause the line executions to be counted again, and for foo.c.gcov to be updated.

  11. gcov Information • man gcov • very brief • The Web page: "gcov: a Test Coverage Program" • http://gcc.gnu.org/onlinedocs/ gcov_1.html

  12. 3. Primes v.1 (p1.c) • This program calculates the primes between 2 and 70,000 (MAXPRIME) by calling prime() for each integer. • The primes are printed in rows, 9 (NUMCOLS) primes per row.

  13. 3.1. p1.c • #include <stdio.h>#define NUMCOLS 9#define MAXPRIME 70000int prime (int n);int main (){ int i; int colCount = 0; :

  14. for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { colCount++; if (colCount%NUMCOLS == 0) { printf ("%5d\n", i); colCount = 0; } else printf ("%5d ", i); } putchar('\n'); return 0;} continued

  15. int prime (int n)/* Is n a prime? Return 0 if yes, 1 otherwise */{ int i; for (i = 2; i < n; i++) if (n % i == 0) return 0; return 1;}

  16. 3.2. p1.c Timings • $ gcc -pg -o p1 p1.c$ p1 2 3 5 7 11 13 17 19 ... ... 69997 $ gprof -b p1Flat profile:Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 100.00 27.59 27.59 69999 394148.49 394148.49 prime : total running time continued

  17. Call graphgranularity: each sample hit covers 4 byte(s) for 0.04% of 27.59 secondsindex % time self children called name 27.59 0.00 69999/69999 main [2][1] 100.0 27.59 0.00 69999 prime[1]----------------------------------------------- <spontaneous>[2] 100.0 0.00 27.59 main [2] 27.59 0.00 69999/69999 prime[1]-----------------------------------------------Index by function name [1] prime

  18. 3.3. p1.c Line Counts • $ gcc -fprofile-arcs -ftest-coverage -o p1 p1.c$ p1 > /dev/null$ gcov p1100.00% of 20 source lines executed in file p1.cCreating p1.c.gcov. continued

  19. $ cat p1.c.gcov #include <stdio.h> #define NUMCOLS 9 #define MAXPRIME 70000 int prime (int n); int main () 1 { int i; 1 int colCount = 0; :

  20. number of primes 70000 for (i = 2; i <= MAXPRIME; i++) 69999 if (prime (i)) {6935 colCount++; 6935 if (colCount%NUMCOLS == 0) { 770 printf ("%5d\n", i); 770 colCount = 0; 770 } else 6165 printf ("%5d ", i); 69999 } 1 putchar('\n'); 1 return 0; 1 } continued

  21. The expensive operations in prime() are the loop and factor test int prime (int n) 69999 { int i;229394196 for (i = 2; i < n; i++)229387261 if (n % i == 0)63064 return 0;6935 return 1; 69999 } no. of non-primes (63064) + no. of primes (6935)= total range (69999).

  22. 4. Primes v.2 (p2.c) • The analysis of p1.c shows it's "hot spots" are the loop and if-test inside prime(). • speeding these up would be good • Mathematical theory says that if n has a factor, then it will occur between 2 and n0.5 • we can use this to reduce the loop range to 2..root(n)

  23. 4.1. p2.c • #include <stdio.h>#include <math.h>#define NUMCOLS 9#define MAXPRIME 70000int prime (int n);int root(int n);int main (){ int i; int colCount = 0; :

  24. for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { colCount++; if (colCount%NUMCOLS == 0) { printf ("%5d\n", i); colCount = 0; } else printf ("%5d ", i); } putchar('\n'); return 0;} continued

  25. int prime (int n){ int i; for (i = 2; i <= root(n); i++) if (n % i == 0) return 0; return 1;}int root(int n){ return (int) sqrt( (float)n );}

  26. 4.2. p2.c Timings • $ gcc -pg -o p2 p2.c -lm$ p2 2 3 5 7 11 13 17 19 ... 69997 $ gprof -b p2Flat profile:Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 50.00 0.28 0.281682490 166.42 166.42 root 50.00 0.560.2869999 4000.06 8000.11 prime : total running time continued

  27. Call graphgranularity: each sample hit covers 4 byte(s) for 1.79% of 0.56 secondsindex % time self children called name 0.28 0.28 69999/69999 main [2][1] 100.0 0.28 0.28 69999 prime[1] 0.28 0.00 1682490/1682490 root [3]----------------------------------------------- <spontaneous>[2] 100.0 0.00 0.56 main [2] 0.28 0.28 69999/69999 prime[1]----------------------------------------------- 0.28 0.00 1682490/1682490 prime[1][3] 50.0 0.28 0.00 1682490 root [3]-----------------------------------------------Index by function name [1] prime [3] root

  28. Some Observations • p2.c is verymuch faster than p1.c: • 0.56 secs compared to 27.59 secs (~50 times) • The reason is that prime() in p2.c takes 0.28 secs compared to 27.59 secs in p1.c • to see why, it helps to know the line counts inside prime()

  29. 4.3. p2.c Line Counts • $ gcc -fprofile-arcs -ftest-coverage -o p2 p2.c -lm$ p2 > /dev/null$ gcov p2100.00% of 22 source lines executed in file p2.cCreating p2.c.gcov. continued

  30. $ cat p2.c.gcov #include <stdio.h> #include <math.h> #define NUMCOLS 9 #define MAXPRIME 70000 int prime (int n); int root(int n); int main () 1 { int i; 1 int colCount = 0; :

  31. same number of primes as in p1.c 70000 for (i = 2; i <= MAXPRIME; i++) 69999 if (prime (i)) {6935 colCount++; 6935 if (colCount%NUMCOLS == 0) { 770 printf ("%5d\n", i); 770 colCount = 0; 770 } else 6165 printf ("%5d ", i); 69999 } 1 putchar('\n'); 1 return 0; 1 } continued

  32. same numbers of non-primes and primes as in p1.c int prime (int n) 69999 { int i;1682490 for (i = 2; i <= root(n); i++)1675555 if (n % i == 0)63064 return 0;6935 return 1; 69999 } int root(int n) {1682490 return (int) sqrt( (float)n ); 1682490 }

  33. Some Observations • The loop and if-test in p2.c's prime() are executed many times less than in p1.c • p2.c loop: 1,682,490 if-test: 1,675,555 • p1.c loop: 229,394,196 if-test: 229,387,261 • The calls to root() are very expensive: • 50% of the total execution time (0.28 secs) • 1,682,490 calls to sqrt() inside root()

  34. 5. Primes v.3 (p3.c) • This version is very similar to p2.c, but the call to root() inside prime() has been moved outside of the loop.

  35. 5.1. p3.c • #include <stdio.h>#include <math.h>#define NUMCOLS 9#define MAXPRIME 70000int prime (int n);int root(int n);int main (){ int i; int colCount = 0; :

  36. for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { colCount++; if (colCount%NUMCOLS == 0) { printf ("%5d\n", i); colCount = 0; } else printf ("%5d ", i); } putchar('\n'); return 0;} continued

  37. int prime (int n){ int i, bound;bound = root(n); for (i = 2; i <= bound; i++) if (n % i == 0) return 0; return 1;}int root(int n){ return (int) sqrt( (float)n );}

  38. 5.2. p3.c Timings • $ gcc -pg -o p3 p3.c -lm$ p3 2 3 5 7 11 13 17 19 ... 69997 $ gprof -b p3Flat profile:Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 100.00 0.150.1569999 2142.89 2142.89 prime 0.00 0.15 0.0069999 0.00 0.00 root continued

  39. Call graphgranularity: each sample hit covers 4 byte(s) for 6.67% of 0.15 secondsindex % time self children called name 0.15 0.00 69999/69999 main [2][1] 100.0 0.15 0.00 69999 prime[1] 0.00 0.00 69999/69999 root [3]----------------------------------------------- <spontaneous>[2] 100.0 0.00 0.15 main [2] 0.15 0.00 69999/69999 prime[1]----------------------------------------------- 0.00 0.00 69999/69999 prime[1][3] 0.0 0.00 0.00 69999 root [3]-----------------------------------------------Index by function name [1] prime [3] root

  40. Some Observations • p3.c is faster than p2.c: • 0.15 secs compared to 0.56 secs (~ 4 times) • The speed-up is due to root(): • p3.c root() time: 0.15 secs, no. calls: 69,999 • p2.c root() time: 0.28 secs, no. calls: 1,682,490

  41. 5.3. p3.c Line Counts • $ gcc -fprofile-arcs -ftest-coverage -o p3 p3.c -lm$ p3 > /dev/null$ gcov p3100.00% of 23 source lines executed in file p3.cCreating p3.c.gcov. continued

  42. $ cat p3.c.gcov #include <stdio.h> #include <math.h> #define NUMCOLS 9 #define MAXPRIME 70000 int prime (int n); int root(int n); int main () 1 { int i; 1 int colCount = 0; :

  43. same number of primes as before 70000 for (i = 2; i <= MAXPRIME; i++) 69999 if (prime (i)) {6935 colCount++; 6935 if (colCount%NUMCOLS == 0) { 770 printf ("%5d\n", i); 770 colCount = 0; 770 } else 6165 printf ("%5d ", i); 69999 } 1 putchar('\n'); 1 return 0; 1 } continued

  44. same numbers of non-primes and primes as before int prime (int n) 69999 { int i, bound; 69999 bound = root(n); 1682490 for (i = 2; i <= bound; i++) 1675555 if (n % i == 0)63064 return 0;6935 return 1; 69999 } int root(int n) {69999 return (int) sqrt( (float)n ); 69999 } sqrt() called much less

  45. 6. Primes v.4 (p4.c) • This version makes two modifications to the code in p3.c: • add divisibility tests for 2, 3, 5 into prime(), to filter out numbers before the expensive loop • in the process we add some bugs! • have the for-loop start at 7, and increment in steps of 2

  46. 6.1. p4.c • #include <stdio.h>#include <math.h>#define NUMCOLS 9#define MAXPRIME 70000int prime (int n);int root(int n);int main (){ int i; int colCount = 0; :

  47. for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { colCount++; if (colCount%NUMCOLS == 0) { printf ("%5d\n", i); colCount = 0; } else printf ("%5d ", i); } putchar('\n'); return 0;} continued

  48. int prime (int n){ int i, bound;if (n%2 == 0) return 0; if (n%3 == 0) return 0; if (n%5 == 0) return 0; bound = root(n); for (i = 7; i <= bound; i = i+2) if (n % i == 0) return 0; return 1;}int root(int n){ return (int) sqrt( (float)n ); }

  49. 6.2. p4.c Timings where are 2, 3, and 5? • $ gcc -pg -o p4 p4.c -lm$ p4 7 11 13 17 19 23 29 31 37 41 43 47 ... 69997 $ gprof -b p4Flat profile:Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 92.31 0.12 0.12 69999 1714.31 1714.31 prime 7.69 0.130.01 main 0.00 0.13 0.0018665 0.00 0.00 root continued

  50. Call graphgranularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 secondsindex % time self children called name <spontaneous>[1] 100.0 0.01 0.12 main [1] 0.12 0.00 69999/69999 prime [2]----------------------------------------------- 0.12 0.00 69999/69999 main [1][2] 92.3 0.12 0.00 69999 prime [2] 0.00 0.00 18665/18665 root [3]----------------------------------------------- 0.00 0.00 18665/18665 prime [2][3] 0.0 0.00 0.00 18665 root [3]-----------------------------------------------Index by function name [1] main [2] prime [3] root

More Related