1 / 31

What simplifications could a compiler, or you, do without sacrifice fast execution?

What simplifications could a compiler, or you, do without sacrifice fast execution?. 5-7 Code optimization. Two functions f and g.

palmer
Download Presentation

What simplifications could a compiler, or you, do without sacrifice fast execution?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. William Sandqvist william@kth.se

  2. What simplifications could a compiler, or you, do without sacrifice fast execution? William Sandqvist william@kth.se

  3. 5-7 Code optimization Two functions f and g #define MAX 10int a[MAX], b[MAX], c[MAX], x[MAX], y[MAX];int i, j, r, s;. . .int f(int a, int b){int z; z = 2 * a – b;return z;}int g(int a, int b, int c){int z; z = a * c – c * b;return z;} What code optimization can the compiler do? -O, -O0, -O1, -O2, -O3, -Os ? With the –O or –O0 you have to do all optimi-zations yourself William Sandqvist william@kth.se

  4. Optimization flags -O, -O0 No optimization-O1 Optimize for size-O2 Optimize for speed and enable some optimization-O3 Enable all optimizations as O2, and intensive loop optimizations-Os Optimize for speed Default setting! William Sandqvist william@kth.se

  5. Two for loops . . .for(i = 0; i <= MAX -1; i++) { x[i] = f(a[i], b[i]); }s = 2 * r;for(j = 0; j <= MAX - 1; j++) { y[j] = s * g(a[j], b[j], c[j]); } What can be done? We want shorter execution time without increasing the code! William Sandqvist william@kth.se

  6. Loop integration The two loops have the same range (0, MAX-1), and no data dependency (x only in loop1, y only in loop2). Loops can be integrated – saves loop overhead ( only i )! s = 2 * r;for(i = 0; i <= MAX - 1; i++) { x[i] = f(a[i], b[i]); y[j] = s * g(a[j], b[j], c[j]); } William Sandqvist william@kth.se

  7. Precalculation at compile time The defined constant MAX is used as MAX - 1 in the loop. MAX - 1 could be precalculated as 10 – 1 = 9 at compile time! s = 2 * r;for(i = 0; i <= 9; i++) { x[i] = f(a[i], b[i]); y[j] = s * g(a[j], b[j], c[j]); } William Sandqvist william@kth.se

  8. Algebraic simplification Rewriting function g can save one multiplication operation: mul sub mul mul sub int g(int a, int b, int c){int z; z = c * (a – b);return z;} William Sandqvist william@kth.se

  9. Inlining of functions Both functions f and g are ”short” and their code could be inserted directly in the loop. int a[10], b[10], c[10], x[10], y[10];int i, r, s;s = 2 * r;for(i = 0; i <= 9; i++) { x[i] = 2 * a[i] – b[i]; y[j] = s * ((a[i] – b[i]) * c[i]); } loop unrolling would give shorter execution time, but it would also increase the code size, so it can’t be used in this case. William Sandqvist william@kth.se

  10. William Sandqvist william@kth.se

  11. 5-2 Register lifetime A processor has this instruction type: op R1, R2, R3 all three registers must be different. Code to run: u = c + d; (1)v = a – b; (2)w = a – u; (3)x = v + e; (4) How many registers are needed? William Sandqvist william@kth.se

  12. Register Life Time Graph u = c + d; (1)v = a – b; (2)w = a – u; (3)x = v + e; (4) Four registers are needed! William Sandqvist william@kth.se

  13. Data Flow Graph A Data Flow Graph can detect data dependencies. u = c + d; (1)v = a – b; (2)w = a – u; (3)x = v + e; (4) • Must be before (3) • Must be before (4) (2) and (3) can change execution order! William Sandqvist william@kth.se

  14. New Register Life Time Graph New instruction order u = c + d; (1)w = a – u; (2’)v = a – b; (3’)x = v + e; (4) Now only 3 registers needed. Saving 25%. William Sandqvist william@kth.se

  15. William Sandqvist william@kth.se

  16. 5-8 CDFG • Control and Data Flow Graph (CDFG) • Multiplication takes 3 cycles, all other instructions take 1 cycle. Best/Worst execution time? mode =0 TBest = 1+1= 2 y = 0;if(mode == 1) {for(i = 0; i < 5; i++) { y += a[i] * b[i]; } } mode =1 TWorst =1+1 +1+(5+1) + 5*4 +5 = 34 T = 3+1 = 4 William Sandqvist william@kth.se

  17. Multiply – Accumulate operation c) MAC-unit! R1 = R1 + R2 * R3 in one cycle! y += a[i] * b[i]; /* one cycle */ TWorst = 1+1 +1+(5+1) + 5*1 +5 = 19 19/34 = 0.56. With MAC 56% of ordinary processor execution time. T = 1 William Sandqvist william@kth.se

  18. William Sandqvist william@kth.se

  19. Processes on a CPU William Sandqvist william@kth.se

  20. Scheduling states of process William Sandqvist william@kth.se

  21. Priority Driven Scheduling • Each process has fixed priority • The ready process with the highestpriority executes • Process executes until completion or preemtion by higher priority process William Sandqvist william@kth.se

  22. Examples of sampling frequencies and execution period. Actuator servo2000 Hz RTOS GPS sensor20 Hz Process periods:GPS=1/20 =50 ms Speed =1/1000 =1 ms Joystick = 1/500 =2 ms Servo = 1/2000 =0.5 ms Speed sensor1 kHz Joystick500 Hz Tasks will often run periodicaly with different processperiods. William Sandqvist william@kth.se

  23. Task Triplet P( max execution time, period, deadline ) deadline < = period RMS: deadline = period (simplification) William Sandqvist william@kth.se

  24. 6-2 Processor utilization and feasible scheduling Task Triplet:P(execution time, period, deadline) deadline = period P1(3, 9, 9) P2(1, 2, 2) P3(1, 6, 6) Timeline = least-common multiple of process periods 9, 2, 6 33, 2, 23 332 = 18 CPU utilization: 100% ? William Sandqvist william@kth.se

  25. Rate Monotonic Scheduling RMS shortest period is assigned the highest priority and so on. RMS guarantee, feasible schedule exists if : In this case U = 1 so there is no guarantee! n = 3 U < 0.78 ( Limit: n =  U < 69% ) William Sandqvist william@kth.se

  26. RMS figure Priorities: P2 > P3 > P1 (2 < 6 < 9) P1 misses the deadline! No feasible schedule with RMS! William Sandqvist william@kth.se

  27. Earliest Deadline First Scheduling EDF guarantee, feasible schedule exists if : U 1This case U = 1, EDF shall produce a feasible schedule. William Sandqvist william@kth.se

  28. William Sandqvist william@kth.se

  29. 6.3 Scheduling and semaphores P(execution time, period, deadline) P1(1, 3, 3) P2(1, 4, 4) P3(2, 6, 6) 3, 22, 23 322 = 12 RMS P1 > P2 > P3 (3 < 4 < 6) Sem1 is a binary semaphore. accessSem1() and releaseSem1() takes 0 time. William Sandqvist william@kth.se

  30. RMS with no critical sections William Sandqvist william@kth.se

  31. RMS with critical sections William Sandqvist william@kth.se

More Related