1 / 16

CSE 522 WCET Analysis

CSE 522 WCET Analysis. Computer Science & Engineering Department Arizona State University Tempe, AZ 85287 Dr. Yann -Hang Lee yhlee@asu.edu (480) 727-7507. Some of the slides were based on the lecture by G. Fainekos (ASU). Execution Time – WCET & BCET.

tal
Download Presentation

CSE 522 WCET Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 522WCET Analysis Computer Science & Engineering DepartmentArizona State University Tempe, AZ 85287 Dr. Yann-Hang Leeyhlee@asu.edu(480) 727-7507 Some of the slides were based on the lecture by G. Fainekos(ASU)

  2. Execution Time – WCET & BCET (Figure from R.Wilhelm et al., ACM Trans. Embed. Comput. Sys, 2007.)

  3. The WCET Problem • Given • the code for a software task • the platform (OS + hardware) that it will run on • Determine the WCET of the task. • Why is this problem important? • The WCET is central in the design of real-time computing • Can the WCET always be found? • In general, not a decidability problem, but a complexity problem • Compute bounds for the execution times of instructions and basic blocks and determine a longest path in the basic-block graph of the program.

  4. Components of Execution Time Analysis • Program path (Control flow) analysis • Want to find longest path through the program • Identify feasible paths through the program • Find loop bounds • Identify dependencies amongst different code fragments • Processor behavior analysis • For small code fragments (basic blocks), generate bounds on run-times on the platform • Model details of architecture, including cache behavior, pipeline stalls, branch prediction, etc. • Outputs of both analyses feed into each other

  5. Program Path Analysis: Overall Approach (1) • Construct Control-Flow Graph (CFG) for the task • Nodes represent Basic Blocks of the task • Basic block: a sequence of consecutive program statements where there is no possibility of branching • We have a single entry and a single exit node • Edges represent flow of control (jumps, branches, calls, …) • The problem is to identify the longest path in the CFG • Note: CFG can have loops, so need to infer loop bounds and unroll them • This gives us a directed acyclic graph (DAG). How do we find the longest path in this DAG?

  6. Program Path Analysis: Overall Approach (2) • In a CFG • Bi = basic block i • xi = number of times the block Bi is executed • dj= number of times edge is executed • ci= worst case running time of block Bi • Objective: find • How to get xi? • Structural constraints • Functionality constraints • Loop bounds -- need to be known

  7. CFG Example d1 N = 10; q = 0; while(q < N) q++; q = r; B1: N = 10; q = 0; x1 Want to maximize i cixi subject to constraints x1 = d1 = d2 d1 = 1 x2 = d2+d4 = d3+d5 x3 = d3 = d4 = 10 x4 = d5 = d6 d2 d4 B2: while(q<N) x2 d3 d5 0 1 B3: q++; B4: q = r; x4 x3 d6 Example due to Y.T. Li and S. Malik

  8. d1 x1 B1 s = k; d2 x2 d8 B2 while (k < 10){ d3 x3 B3 if (ok) d5 d4 B5 j = 0; ok = true; x5 x4 B4 j++; d6 d7 k++; B6 d9 x6 B7 r = j; x7 d10 CFG – Another example /* k >=0 */ s = k; while (k < 10){ if (ok) j++; else { j = 0; ok = true; } k++; } r = j;

  9. Functionality Constraints check_data() { x1inti, morecheck, wrongone; x2morecheck = 1; i = 0; wrongone = -1; x3 while (morecheck) { x4 if (data[i] < 0) { x5wrongone = i; morecheck = 0; } else x6 if (++i >= 10) x7morecheck = 0; } x8 if (wrongone >= 0) x9 return 0; else x10 return 1; } Constraints x2 x4 x4  10x2 (x5 = 0 & x7 = 1) | (x5 = 1 & x7 = 0) x5 = x9

  10. Micro-architectural Modeling -- Cache Modify cost function (cache hit and miss have different costs) Add linear constraints to describe relationship between cache hits and misses Basic idea • Basic blocks assumed to be smaller than entire cache • Subdivide instruction counts (xi) into counts of cache hits (xihit) and misses (ximiss) • Line-block (or l-block) is a contiguous sequence of code within the same basic block that is mapped to the same cache line in the instruction cache • Either all hit or all miss in a l-block

  11. B1.1 B1.2 B1.3 B2.1 B2.2 B3.1 B3.2 Basic Blocks to Line Blocks (Direct-mapped cache) Color Cache Set B1 0 1 2 Cache Constraints: 3 B2 No conflicting l-blocks: (only the first execution has a miss) Two nonconflicting l-blocks are mapped to same cache line Conflicting blocks: affected by the sequence B3

  12. start p(s,m.n) p(k.l,k.l) p(s,k.l) p(k.l,m.n) Bm.n Bk.l p(s,e) p(m.n,k.l) p(m.n,m.n) p(m.n,e) p(k.l,e) end Cache Conflict Graph For every cache set containing two or more conflicting l-blocks • start node, end node, and node Bk.l for every l-block in the cache set Edge from Bk.l to Bm.n: control can pass between them without passing through any other l-blocks of the same cache set. • p(i. j,u.v) : the number of times that the control passes through that edge.

  13. d1 Cache x1 B1.1 s = k; d2 x2 d8 B2.1 while (k < 10){ d3 x3 B3.1 if (ok) d5 d4 B5.1 j = 0; ok = true; x5 x4 B4.1 j++; d6 d7 B6.1 k++; d9 x6 B7.1 r = j; x7 d10 Cache Constraints Example (1)

  14. S S p(s,5.1) p(s,1.1) p(s,4.1) p(4.1,4.1) p(1.1,6.1) p(4.1,5.1) B1.1 B4.1 B6.1 B5.1 p(5.1,4.1) p(6.1,6.1) p(5.1,5.1) p(1.1,e) p(4.1,e) p(s,e) p(6.1,e) p(5.1,e) E E Cache Constraints Example (2)

  15. Progress During the Past 10 Years The explosion of penalties has been compensated by a reduction of uncertainties! 200 cache-miss penalty 60 25 30-50% 25% 20-30% 15% over-estimation 10% 4 2002 2005 1995 Lim et al. Thesing et al. Souyris et al.

  16. Open Problems • Architectures are getting much more complex. • Can we create processor behavior models without the pain? • Can we change the architecture to make timing analysis easier? • Small changes to code and/or architecture require completely re-doing the WCET computation • Use robust techniques that learn about processor/platform behavior • Need more reliable ways to measure execution time • References: • Li, Malik, and Wolfe, “Cache Modeling for Real-Time Software: Beyond Direct Mapped Instruction Caches” • Wilhelm, “Determining bounds on execution times,” Handbook on Embedded Systems, CRC Press, 2005

More Related