1 / 32

Code Optimization

Code Optimization. Winter 2013 COMP 2130 Intro Computer Systems Computing Science Thompson Rivers University. Your vision? Seek with all your heart?. Course Objectives. The better knowledge of computer systems, the better programing. Your vision? Seek with all your heart?.

ncutting
Download Presentation

Code Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Code Optimization Winter 2013 COMP 2130 Intro Computer Systems Computing Science Thompson Rivers University

  2. Your vision? Seek with all your heart? Course Objectives • The better knowledge of computer systems, the better programing. Code Optimization

  3. Your vision? Seek with all your heart? Course Contents • Introduction to computer systems: B&O 1 • Introduction to C programming: K&R 1 – 4 • Data representations: B&O 2.1 – 2.4 • C: advanced topics: K&R 5.1 – 5.10, 6 – 7 • Introduction to IA32 (Intel Architecture 32): B&O 3.1 – 3.8, 3.13 • Compiling, linking, loading, and executing: B&O 7 (except 7.12) • Dynamic memory management – Heap: B&O 9.1–2, 9.3–4, 9.9.1–2, 9.9.4–5, 9.11 • Code optimization: B&O 5.1 – 5.6, 5.13 • Memory hierarchy, locality, caching: B&O 5.12, 6.1 – 6.3, 6.4.1 – 6.4.2, 6.5, 6.6.2 – 6.6.3, 6.7 • Virtual memory (if time permits): B&O 9.4 – 9.5 Code Optimization

  4. Your vision? Seek with all your heart? Unit Learning Objectives • List the two optimization blockers. • Give examples of the two optimization blockers. • Use of optimization techniques Code Optimization

  5. Your vision? Seek with all your heart? Unit Contents Code Optimization

  6. Your vision? Seek with all your heart? Introduction • The primary objective in writing a program • To make it work correctly under all possible conditions. • Making a program run fast is also an important consideration. • [Q] How to write an efficient program? • Appropriate algorithms and data structures • Source code that the compiler can effectively optimize to turn into efficient executable code • For the second part, it is important to understand the capabilities and limitations of optimizing compilers. • However programmers must make a trade-off between how easy a program is to implement and maintain, and how fast it runs. Code Optimization

  7. Your vision? Seek with all your heart? • Modern compilers employ sophisticated forms of analysis and optimization. • Even the best compilers, however, can be thwarted by optimization blockers – aspects of the program’s behavior that depend strongly on the execution environment. • Optimization blockers make even programmers get confused and produce logical errors. • Programmers must assist the compiler by writing code that can be optimized readily. Code Optimization

  8. Your vision? Seek with all your heart? 5.1 Limitations of Optimizing Compilers • Higher optimization levels of gcc can improve program performance. • But they may expand program size and they make program more difficult to debug using standard debugging tools. Code Optimization

  9. Your vision? Seek with all your heart? • Compilers must be careful to apply only safe optimizations to a program. • Example: Memory Aliasing void twiddle1(int *xp, int *yp) { *xp += *yp; *xp += *yp; } [Q] Can twiddle1 be replaced by twiddle2? void twiddle2(int *xp, int *yp) { *xp += 2 * *yp; } [Q] What if *xp == *yp? • In twiddle1, *xpbecomes triple, but • In twiddle2, *xpbecomes twice. [Q] Is it a good programming style to pass pointers and manipulate them? How to improve twiddle1()? Code Optimization

  10. Your vision? Seek with all your heart? • Example x = 1000; y = 3000; *q = y; *p = x; t1 = *q; • [Q] What value will t1have? • 1000 or 3000 -> It is not easy even for us to understand the above code. -> Definitely not a good programming style. • Compilers cannot replace the code with t1 = y;. • Optimization blockers • Memory aliasing around pointers • … Code Optimization

  11. Your vision? Seek with all your heart? • Example: Side Effect int f(); int func1() { return f() + f() + f() + f(); } int func2() { return 4 * f(); } • [Q] Can you see any problem? • [Q] What if int count = 0; int f() { return counter++; } ? • [Q] What will func1() and func2() return? • [Q] Good programming style? How to improve? • Optimization blockers • Memory aliasing around pointers • Functions with a side effect • … Code Optimization

  12. Your vision? Seek with all your heart? 5.2 Expressing Program Performance • Cycles Per Element (CPE) • How many instructions (cycles) (, not the number of C lines,) are being executed rather than how fast the clock runs. Code Optimization

  13. Your vision? Seek with all your heart? • Example: loop unrolling void psum1(float a[], float p[], long int n) { long int i; p[0] = a[0]; for (i = 1; i < n; i++) p[i] = p[i-1] + a[i]; } void psum2(float a[], float p[], long int n) { long int i; float mid_val; p[0] = a[0]; for (i = 1; i < n-1; i += 2) { mid_val = p[i-1] + a[i]; p[i] = mid_val; p[i+1] = mid_val + a[i+1]; } if (i < n) p[i] = p[i-1] + a[i]; } • [Q] Which one do you think run faster? • [Q] Can you simply count the # of operations that access main memory? 3  (n-1) 5  (n-1)/2 Code Optimization

  14. Your vision? Seek with all your heart? • Loop unrolling • Possibly reduce the number of memory accesses. • Possibly run multiple statements in parallel over multi-core CPUs. • In the previous example ??? Code Optimization

  15. Your vision? Seek with all your heart? 5.3 Program Example typedef struct { // vector abstract data type long int len; data_t *data; // vector values } vec_rec, *vec_ptr; #define IDENT 0 #define OP + void combine1(vec_ptr v, data_t *dest) { long int i; *dest = IDENT; for (i=0; i < vec_length(v); i++) { // it is good to hide len. data_t val; get_vec_element(v, i, &val); // it changes val. *dest = *dest OP val; // it makes sum. } } • [Q] Can you write vec_length() andget_vec_element()? • [Q] Compilers can optimize the above code well. Can you optimize? Code Optimization

  16. Your vision? Seek with all your heart? 5.4 Eliminating Loop Inefficiencies • Code motion • Identifying a computation that is performed multiple times (e.g., within a loop), such that the result of the computation will not change. • Example: void combine1(vec_ptr v, data_t *dest) { long inti; *dest = IDENT; for (i = 0; i < vec_length(v); i++) { data_tval; get_vec_element(v, i, &val); // it changes val. *dest = *dest OP val; // it makes sum. } } • Does vec_length() have a side effect? Or is the length of the vector changed in the loop? • No. • Then? Code Optimization

  17. Your vision? Seek with all your heart? • From the previous example: void combine2(vec_ptr v, data_t *dest) { long int i; long int length = vec_length(v); *dest = IDENT; for (i = 0; i < length; i++) { data_t val; get_vec_element(v, i, &val); // it changes val. *dest = *dest OP val; // it makes sum. } } • Example: Any problem? How can you improve? void lower1(char *s) { int i; for (i = 0; i < strlen(s); i++) if (s[i] >= ‘A’ && s[i] <= ‘Z’) s[i] -= ‘A’ – ‘a’; } Can we remove &val? Code Optimization

  18. Your vision? Seek with all your heart? • From the previous example: void combine2(vec_ptr v, data_t *dest) { long int i; long int length = vec_length(v); *dest = IDENT; for (i = 0; i < length; i++) { data_t val; get_vec_element(v, i, &val); // it changes val. *dest = *dest OP val; // it makes sum. } } • Example: Any problem? How can you improve? void lower1(char *s) { int i; for (i = 0; i < strlen(s); i++) if (s[i] >= ‘A’ && s[i] <= ‘Z’) s[i] -= ‘A’ – ‘a’; } Can we remove &val? Code Optimization

  19. Your vision? Seek with all your heart? • Example: void set_row(double *a, double *b, long i, long n) { long j; for (j = 0; j < n; j++) a[n*i+j] = b[j]; } • How to improve? Code Optimization

  20. Your vision? Seek with all your heart? • Example: void set_row(double *a, double *b, long i, long n) { long j; for (j = 0; j < n; j++) a[n*i+j] = b[j]; } • How to improve? Code Optimization

  21. Your vision? Seek with all your heart? 5.5 Reducing Procedure Calls • From the previous example: Any problem? typedef struct { // vector abstract data type long int len; data_t *data; // vector values } vec_rec, *vec_ptr; int get_vec_element(vec_ptr v, long int index, data_t *dest) { if (index < 0 || index >= v->len) return 0; *dest = v->data[index]; return 1; } void combine2(vec_ptr v, data_t *dest) { long int i; long int length = vec_length(v); *dest = IDENT; for (i = 0; i < length; i++) { data_t val; get_vec_element(v, i, &val); // it changes val. *dest = *dest OP val; // it makes sum. } } Code Optimization

  22. Your vision? Seek with all your heart? • From the previous example: typedef struct { // vector abstract data type long int len; data_t *data; // vector values } vec_rec, *vec_ptr; int get_vec_element(vec_ptr v, long int index, data_t *dest) { if (index < 0 || index >= v-> len) return 0; *dest = v->data[index]; return 1; } void combine3(vec_ptr v, data_t *dest) { long int i; long int length = vec_length(v); *dest = IDENT; data_t *data = get_vec_start(v); // v->data for (i = 0; i < length; i++) *dest = *dest OP data[i]; // it makes sum. } Can you write get_vec_start()? Code Optimization

  23. Your vision? Seek with all your heart? 5.6 Eliminating Unneeded Memory References • From the previous example: Any problem? void combine3(vec_ptr v, data_t *dest) { long int i; long int length = vec_length(v); data_t *data = get_vec_start(v); // v->data *dest = IDENT; for (i = 0; i < length; i++) *dest = *dest OP data[i]; // it makes sum. } // the statement in for loop // data_t = int; OP = *; i in %edx, data in %ecx, dest in %ebx movl (%ebx), %eax imull (%ecx, %edx, 4), %eax movl %eax, (%ebx) Code Optimization

  24. Your vision? Seek with all your heart? • From the previous example: Any problem? void combine3(vec_ptr v, data_t *dest) { long int i; long int length = vec_length(v); data_t *data = get_vec_start(v); // v->data *dest = IDENT; for (i = 0; i < length; i++) *dest = *dest OP data[i]; // it makes sum. } // the statement in for loop // data_t = int; OP = *; i in %edx, data in %ecx, dest in %ebx movl (%ebx), %eax imull (%ecx, %edx, 4), %eax movl %eax, (%ebx) Code Optimization

  25. Your vision? Seek with all your heart? • From the previous example: void combine4(vec_ptr v, data_t *dest) { long int i; long int length = vec_length(v); data_t *data = get_vec_start(v); // v->data data_t acc = IDENT; // can be implemented in a register for (i = 0; i < length; i++) acc = acc OP data[i]; // it makes sum. *dest = acc; } // the statement in for loop // data_t = int; OP = *; i in %edx, data in %ecx, acc in %eax imull (%ecx, %edx, 4), %eax Code Optimization

  26. Your vision? Seek with all your heart? 5.13 Performance Improvement Techniques • High-level design • Appropriate algorithms and data structures • Basic coding principles • Elimination of loop inefficiency • Elimination of excessive function calls • Elimination of unnecessary memory references – Introduce temporary variables to hold intermediate results. • Elimination of pointers if possible • … • Low-level optimizations • Unroll loops to reduce overhead and to enable further optimizations. • Find ways to increase instruction-level parallelism. Code Optimization

  27. Your vision? Seek with all your heart? • Unroll loops to reduce overhead and to enable further optimizations. • Find ways to increase instruction-level parallelism. for (i = 0; i < length; i++) acc = acc OP data[i]; // it makes sum. *dest = acc; //------------------------- limit = length – 1; for (i = 0; i < limit; i += 2) { // combine two elements acc0 = acc0 OP data[i]; // two statements at a time acc1 = acc1 OP data[i+1]; } for (; i < length; i++) // finish any remaining elements acc1 = acc1 OP data[i]; *dest = acc0 OP acc1; Code Optimization

  28. Your vision? Seek with all your heart? • Example: Convert the following code to use 4-way loop unrolling: for (i = 0; i < length; i++) sum = sum + udata[i] * vdata[i]; *dest = sum; Code Optimization

  29. Your vision? Seek with all your heart? • Example: Improve the following code by using a word of data type unsigned long to pack four copies of c: void *basic_memset(void *s, int c, int n) { int cnt = 0; unsigned char *schar = s; while (cnt < n) { *schar = (unsigned char) c; schar++; cnt++; } } Code Optimization

  30. Your vision? Seek with all your heart? void *memset(void *s, int c, int n) { int cnt = 0; int length = n / 4; unsigned ic; unsigned char *schar = s; unsigned int *si = s; c = c & 0xff; ic = c << 24 + c << 16 + c << 8 + c; while (cnt < length) { *si = ic; si++; cnt++; } cnt = length * 4; schar += length * 4; while (cnt < n) { *schar = (unsigned char) c; schar++; cnt++; } } Code Optimization

  31. Carnegie Mellon Reduction in Strength • Replace costly operation with simpler one • Shift, add instead of multiply or divide 16*x --> x << 4 • Utility machine dependent • Depends on cost of multiply or divide instruction • On Intel Nehalem, integer multiply requires 3 CPU cycles • Recognize sequence of products int ni = 0; for (i = 0; i < n; i++) { for (j = 0; j < n; j++) a[ni + j] = b[j]; ni += n; } for (i = 0; i < n; i++) for (j = 0; j < n; j++) a[n*i + j] = b[j];

  32. Carnegie Mellon Share Common Subexpressions • Reuse portions of expressions • Compilers often not very sophisticated in exploiting arithmetic properties /* Sum neighbors of i,j */ up = val[(i-1)*n + j ]; down = val[(i+1)*n + j ]; left = val[i*n + j-1]; right = val[i*n + j+1]; sum = up + down + left + right; long inj = i*n + j; up = val[inj - n]; down = val[inj + n]; left = val[inj - 1]; right = val[inj + 1]; sum = up + down + left + right; 3 multiplications: i*n, (i–1)*n, (i+1)*n 1 multiplication: i*n leaq 1(%rsi), %rax # i+1 leaq -1(%rsi), %r8 # i-1 imulq %rcx, %rsi # i*n imulq %rcx, %rax # (i+1)*n imulq %rcx, %r8 # (i-1)*n addq %rdx, %rsi # i*n+j addq %rdx, %rax # (i+1)*n+j addq %rdx, %r8 # (i-1)*n+j imulq %rcx, %rsi # i*n addq %rdx, %rsi # i*n+j movq %rsi, %rax # i*n+j subq %rcx, %rax # i*n+j-n leaq (%rsi,%rcx), %rcx # i*n+j+n

More Related