1 / 41

Program design and analysis

Program design and analysis. Compilation flow. Basic statement translation. Basic optimizations. Interpreters and just-in-time compilers. Compilation. Compilation strategy (Wirth): compilation = translation + optimization Compiler determines quality of code: use of CPU resources;

dennyj
Download Presentation

Program design and analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Program design and analysis • Compilation flow. • Basic statement translation. • Basic optimizations. • Interpreters and just-in-time compilers. Overheads for Computers as Components

  2. Compilation • Compilation strategy (Wirth): • compilation = translation + optimization • Compiler determines quality of code: • use of CPU resources; • memory access scheduling; • code size. Overheads for Computers as Components

  3. Basic compilation phases HLL parsing, symbol table machine-independent optimizations machine-dependent optimizations assembly Overheads for Computers as Components

  4. Statement translation and optimization • Source code is translated into intermediate form such as CDFG. • CDFG is transformed/optimized. • CDFG is translated into instructions with optimization decisions. • Instructions are further optimized. Overheads for Computers as Components

  5. a*b + 5*(c-d) Arithmetic expressions b a c d * - expression 5 * + DFG Overheads for Computers as Components

  6. lis r4,a@ha lwz r1,a@l[r4] lis r4,b@ha lwz r2,b@hl[r4] mullw r3,r1,r2 Arithmetic expressions, cont’d. b a c d 1 2 * - 5 3 lis r4,c@ha lwz r1,c@l[r4] lis r4,d@ha lwz r5,d@l[r4] sub r6,r4,r5 * 4 + mullwi r7,r6,5 add r8,r7,r3 DFG code Overheads for Computers as Components

  7. if (a+b > 0) x = 5; else x = 7; Control code generation a+b>0 x=5 x=7 Overheads for Computers as Components

  8. lis r5,a@ha lwz r1,a@l[r5] lis r5,b@ha lwz r2,b@l add r3,r1,r2 ble label3 Control code generation, cont’d. 1 2 a+b>0 x=5 3 li r3,5 lis r5,x@ha stw r3,x@l[r5] b stmtent x=7 label3 li r3,7 lis r5,x@ha stw r3,x@l[r5] stmtent ... Overheads for Computers as Components

  9. Procedure linkage • Need code to: • call and return; • pass parameters and results. • Parameters and returns are passed on stack. • Procedures with few parameters may use registers. Overheads for Computers as Components

  10. 5 accessed relative to SP Procedure stacks growth proc1 proc1(int a) { proc2(5); } FP frame pointer proc2 SP stack pointer Overheads for Computers as Components

  11. ARM procedure linkage • APCS (ARM Procedure Call Standard): • r0-r3 pass parameters into procedure. Extra parameters are put on stack frame. • r0 holds return value. • r4-r7 hold register values. • r11 is frame pointer, r13 is stack pointer. • r10 holds limiting address on stack size to check for stack overflows. Overheads for Computers as Components

  12. Data structures • Different types of data structures use different data layouts. • Some offsets into data structure can be computed at compile time, others must be computed at run time. Overheads for Computers as Components

  13. One-dimensional arrays • C array name points to 0th element: a[0] a = *(a + 1) a[1] a[2] Overheads for Computers as Components

  14. M N ... Two-dimensional arrays • Column-major layout: a[0,0] a[0,1] ... a[1,0] = a[i*M+j] a[1,1] Overheads for Computers as Components

  15. 4 bytes *(aptr+4) Structures • Fields within structures are static offsets: aptr field1 struct { int field1; char field2; } mystruct; struct mystruct a, *aptr = &a; field2 Overheads for Computers as Components

  16. Expression simplification • Constant folding: • 8+1 = 9 • Algebraic: • a*b + a*c = a*(b+c) • Strength reduction: • a*2 = a<<1 Overheads for Computers as Components

  17. Dead code: #define DEBUG 0 if (DEBUG) dbg(p1); Can be eliminated by analysis of control flow, constant folding. Dead code elimination 0 0 1 dbg(p1); Overheads for Computers as Components

  18. Procedure inlining • Eliminates procedure linkage overhead: int foo(a,b,c) { return a + b - c;} z = foo(w,x,y); ð z = w + x - y; Overheads for Computers as Components

  19. Loop transformations • Goals: • reduce loop overhead; • increase opportunities for pipelining and parallelism; • improve memory system performance. Overheads for Computers as Components

  20. Loop unrolling • Reduces loop overhead, enables some other optimizations. for (i=0; i<4; i++) a[i] = b[i] * c[i]; ð for (i=0; i<2; i++) { a[i*2] = b[i*2] * c[i*2]; a[i*2+1] = b[i*2+1] * c[i*2+1]; } Overheads for Computers as Components

  21. Loop fusion and distribution • Fusion combines two loops into 1: for (i=0; i<N; i++) a[i] = b[i] * 5; for (j=0; j<N; j++) w[j] = c[j] * d[j]; ð for (i=0; i<N; i++) { a[i] = b[i] * 5; w[i] = c[i] * d[i]; } • Distribution breaks one loop into two. • Changes optimizations within loop body. Overheads for Computers as Components

  22. Array/cache problems • Prefetching. • Cache conflicts within a single array. • Cache conflicts between arrays. Overheads for Computers as Components

  23. 1-D array and cache • Layout of 1-D array in cache: A[12] A[13] A[14] A[15] Line 3 A[8] A[9] A[10] A[11] Line 2 A[4] A[5] A[6] A[7] Line 1 A[0] A[1] A[2] A[3] Line 0 X += A[I] Overheads for Computers as Components

  24. A[12] A[13] A[14] A[15] A[8] A[9] A[10] A[11] B[12] B[13] B[14] B[15] B[8] B[9] B[10] B[11] Two 1-D arrays and cache • Arrays don’t immediately conflict: B[4] B[5] B[6] B[7] Line 3 B[0] B[1] B[2] B[3] Line 2 A[4] A[5] A[6] A[7] Line 1 A[0] A[1] A[2] A[3] Line 0 X += A[I] * B[I] Overheads for Computers as Components

  25. B[0] B[1] B[2] B[3] Conflicting 1-D arrays • Arrays immediately conflict in cache: A[12] A[13] A[14] A[15] Line 3 A[8] A[8] A[10] A[11] Line 2 A[4] A[5] A[6] A[7] Line 1 A[0] A[1] A[2] A[3] Line 0 X += A[I] * B[I] Overheads for Computers as Components

  26. 2-D array and cache • Cache behavior depends on row/column access pattern: A[1,4] A[1,5] A[1,6] A[1,7] Line 3 A[1,0] A[1,1] A[1,2] A[1,3] Line 2 A[0, 4] A[0, 5] A[0, 6] A[0, 7] Line 1 A[0,0] A[0,1] A[0, 2] A[0, 3] Line 0 Overheads for Computers as Components

  27. Array/cache solutions • Move origin of the array in memory. • Pad the array. • Change the loop to access array elements in a different order. Overheads for Computers as Components

  28. Loop tiling • Breaks one loop into a nest of loops. • Changes order of accesses within array. • Changes cache behavior. Overheads for Computers as Components

  29. for (i=0; i<N; i++) for (j=0; j<N; j++) c[i] = a[i,j]*b[i]; for (i=0; i<N; i+=2) for (j=0; j<N; j+=2) for (ii=0; ii<min(i+2,n); ii++) for (jj=0; jj<min(j+2,N); jj++) c[ii] = a[ii,jj]*b[ii]; Loop tiling example Overheads for Computers as Components

  30. Array padding • Add array elements to change mapping into cache: a[0,0] a[0,1] a[0,2] a[0,0] a[0,1] a[0,2] a[0,3] a[1,0] a[1,1] a[1,2] a[1,0] a[1,1] a[1,2] a[1,3] before after Overheads for Computers as Components

  31. Register allocation • Goals: • choose register to hold each variable; • determine lifespan of varible in the register. • Basic case: within basic block. Overheads for Computers as Components

  32. w = a + b; x = c + w; y = c + d; Register lifetime graph t=1 a t=2 b c t=3 d w x y time 1 2 3 Overheads for Computers as Components

  33. Instruction scheduling • Non-pipelined machines do not need instruction scheduling: any order of instructions that satisfies data dependencies runs equally fast. • In pipelined machines, execution time of one instruction depends on the nearby instructions: opcode, operands. Overheads for Computers as Components

  34. A reservation table relates instructions/time to CPU resources. Time/instr A B instr1 X instr2 X X instr3 X instr4 X Reservation table Overheads for Computers as Components

  35. Software pipelining • Schedules instructions across loop iterations. • Reduces instruction latency in iteration i by inserting instructions from iteration i+1. Overheads for Computers as Components

  36. Software pipelining in SHARC • Example: for (i=0; i<N; i++) sum += a[i]*b[i]; • Combine three iterations: • Fetch array elements a, b for iteration i. • Multiply a,b for iteration i-1. • Compute dot product for iteration i-2. Overheads for Computers as Components

  37. Software pipelining in SHARC, cont’d /* first iteration performed outside loop */ ai=a[0]; bi=b[0]; p=ai*bi; /* initiate loads used in second iteration; remaining loads will be performed inside the loop */ for (i=2; i<N-2; i++) { ai=a[i]; bi=b[i]; /* fetch for next cycle’s multiply */ p = ai*bi; /* multiply for next iteration’s sum */ sum += p; /* make sum using p from last iteration */ } sum += p; p=ai*bi; sum +=p; Overheads for Computers as Components

  38. Software pipelining timing ai=a[i]; bi=b[i]; p = ai*bi; ai=a[i]; bi=b[i]; time sum += p; p = ai*bi; ai=a[i]; bi=b[i]; pipe sum += p; p = ai*bi; sum += p; iteration i-2 iteration i-1 iteration i Overheads for Computers as Components

  39. Instruction selection • May be several ways to implement an operation or sequence of operations. • Represent operations as graphs, match possible instruction sequences onto graph. + + * + * * MUL ADD expression templates MADD Overheads for Computers as Components

  40. Using your compiler • Understand various optimization levels (-O1, -O2, etc.) • Look at mixed compiler/assembler output. • Modifying compiler output requires care: • correctness; • loss of hand-tweaked code. Overheads for Computers as Components

  41. Interpreters and JIT compilers • Interpreter: translates and executes program statements on-the-fly. • JIT compiler: compiles small sections of code into instructions during program execution. • Eliminates some translation overhead. • Often requires more memory. Overheads for Computers as Components

More Related