1 / 38

Timing Analysis for Modern Architectures

Timing Analysis for Modern Architectures. Sang Lyul Min Dept. of Computer Engineering Seoul National University. Overview. Intra-task analysis (WCET analysis) Cache memory Pipelined execution Inter-task analysis Cache memory Experiments Conclusions and Future Work. Intra-task Analysis.

zorion
Download Presentation

Timing Analysis for Modern Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Timing Analysis for Modern Architectures Sang Lyul Min Dept. of Computer Engineering Seoul National University

  2. Overview • Intra-task analysis (WCET analysis) • Cache memory • Pipelined execution • Inter-task analysis • Cache memory • Experiments • Conclusions and Future Work

  3. Intra-task Analysis • Why WCET analysis is important? • Safe and tight WCET (worst case execution time) estimate is a prerequisite of correct and accurate schedulability analysis

  4. Schedulability AnalysisExamples • Utilization bound-based approach • Response time-based approach

  5. Good Old Days • No cache memory • No pipelined execution Fixed instruction execution times (Simple table look-up)

  6. Timing Schema • S: S1; S2 • S: if (exp) then S1 else S2 • S: while (exp) S1

  7. So what is the problem?

  8. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 x x x x x IF x x x RD x x FRD x x x ALU x x x x x x FALU MD x x FMUL x x x x x x x x FDIV x x x MEM x x FMEM x x x WB x x FWB x x FFWB Pipelined Execution div.s $f2, $f4, $f6 lw $8, 4($sp) nop mul.s $f8, $f10, $f12 addiu $9, $8, 4

  9. x x x x x x x x x x x x x x x x 1 2 3 4 5 6 x x x x IF x x x x RD ALU x x x x MD DIV 1 2 3 4 5 6 7 8 9 MEM x x x x WB IF x x x x RD x x x ALU x x x x MD DIV x x MEM x x x WB The Problem 1 2 3 4 5 6 7 8 9 10 IF RD ALU MD DIV MEM WB

  10. Our Approach • Define PA (Path Abstraction) structure which encodes • elements whose timings are affected • elements that affect other’s timings • Define  op on PAs  + op • Define pruning op on PAs  max op

  11. Instruction Cache Modeling cache contents cache block 0 b2 ? b2 b4 b2 cache block 1 ? ? b3 b3 b3 b2 b3 b2 b4 (hit/miss) (hit/miss) (hit) (miss)

  12. b4 b2 b3 b3 PA Structure for Instruction Cache last_reference first_reference texecution = 38 cycles

  13. pruning Example: Concatenation and Pruning first last b6 b6 b1 b1 first 48 cycles last first last b6 b8 b6 b8 first last first last b1 b7 b8 b6 b4 b1 b8 b5 b7 b7 b5 b5 102 cycles 126 cycles 78 cycles 68 cycles

  14. x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Pipelined Execution Modeling div.s $f2, $f4, $f6 lw $8, 4($sp) nop mul.s $f8, $f10, $f12 addiu $9, $8, 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 IF RD FRD ALU FALU MD FMUL FDIV MEM FMEM WB FWB FFWB

  15. tail head PA Structurefor Pipelined Execution 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 x x x x x IF x x x RD x x FRD x x x ALU x x x x x x FALU MD x x FMUL x x x x x x x x FDIV x x x MEM x x FMEM x x x WB x x FWB x x FFWB tmax = 21 cycles

  16. PA Structurefor Pipelined Execution head tail x x x x x x x x x x x x x x x x x x x x texecution = 38 cycles

  17. 1 2 3 4 5 13 14 15 16 17 IF RD FRD ALU FALU MD FMUL FDIV MEM FMEM WB FWB FFWB 20 1 2 3 4 5 10 11 12 13 14 18 19 1 2 3 4 5 21 22 IF IF RD RD FRD FRD ALU ALU FALU FALU MD MD FMUL FMUL FDIV FDIV MEM MEM FMEM WB WB FWB FWB FFWB FFWB Example:Concatenation and Pruning x x x x x x x x x x x x x x x x x x x tmax = 17 cycles S1 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x FMEM x x x x x tmax = 22 cycles tmax = 14 cycles S2

  18. 1 2 3 4 5 13 14 15 16 17 IF RD FRD ALU FALU MD FMUL FDIV MEM FMEM WB FWB FFWB 20 18 19 1 2 3 4 5 21 22 IF RD FRD ALU FALU MD FMUL FDIV MEM WB FWB FFWB Example:Concatenation and Pruning 1 2 3 4 5 13 14 15 16 17 18 19 20 33 34 35 36 37 x x x x x x x IF x x x x x RD x x FRD x x x x x x x x ALU x x x x x x FALU x x x x x x x x x MD x x x FMUL FDIV x x x x x x x x x MEM FMEM x WB x x x FWB x FFWB x tmax = 37 cycles tmax = 17 cycles x x x x x x x x x x x x x x x FMEM x x tmax = 22 cycles

  19. pruning 1 2 3 4 5 13 14 15 16 17 IF RD FRD ALU FALU MD FMUL FDIV MEM FMEM WB FWB FFWB 1 2 3 4 5 10 11 12 13 14 IF RD FRD ALU FALU MD head FMUL FDIV MEM tail FMEM WB FWB FFWB Example:Concatenation and Pruning x x x x x x x x x x x 22 23 24 25 26 1 2 3 4 5 13 14 15 16 17 x x x x x x x IF x x x x x x x x x x RD x x FRD x x x x x ALU x x x x x FALU x x x x x MD FMUL FDIV x x x x MEM tmax = 17 cycles x FMEM x x WB x FWB x FFWB tmax = 26 cycles x x x x x 26 cycles < 37 cycles - (5 cycles ( )+ 5 cycles ( )) x x x x x x x x x x x x tmax = 14 cycles

  20. Combined PA Structure first_reference last_reference b4 b2 b3 b3 head tail x x x x x x x x x x x x x x x x x x x x texecution = 38 cycles

  21. Extended Timing Schema • S: S1; S2 • S: if (exp) then S1 else S2 • S: while (exp) S1 where

  22. Comparison with Original Timing Schema Original Timing Schema Extended Timing Schema timing element WCET bound Path Abstraction path concatenation +  path elimination pruning max

  23. 0 2 4 6 8 10 12 14 16 18 20 t1 t1,1 t1,2 t1,3 t1,4 t1,5 t2 t2,1 t2,1 t2,2 t2,2 t2,2 t3 t3,1 t3,1 t3,1 t Inter-task Analysis

  24. Two Step Approach 1. Local (per-task) analysis for estimating # of useful cache blocks at each execution point 2. Global analysis for calculating the cache-related preemption delay based on the linear programming technique

  25. m m m m m m m m m m m m 7 1 0 5 6 3 4 5 6 0 0 2 m 0 m useful cache blocks at point P 5 m 6 m 3 Local Analysis (1) • A cache block is useful if it contains a memory block that may be re-referenced before being replaced. • # of useful cache blocks at an execution point gives an upper bound on the cache-related preemption cost at that point.

  26. Definitions c : set of memory blocks that may reside in cache block c at point p RMB p : set of memory blocks that may be the first reference to cache block c after point p c LMB p Local Analysis (2) • A useful cache block at point p is defined as a cache block whose RMBs and LMBs have at least one common memory block.

  27. useful useful Local Analysis (3)

  28. Preemption Cost Table Task Largest Preemption Cost • t1 • f1 Local Analysis of Each Task • t2 • f2 • t3 • f3 ... ... • tn • fn

  29. Augmented Response Time Equation • Iterative Solving ... Global Analysis (1)

  30. maximize subject to Global Analysis (2)

  31. Limitations 0 20 40 60 80 100 120 main memory cache • Not all useful cache blocks are replaced. • Some preemptions are not feasible. t1 5 t2 t3 2 R3

  32. Enhanced Approach • Uses two new features 1. Scenario-sensitive preemption cost 2. Additional constraints from task phasing

  33. FFT LUD LMS FIR cache mapping 1 cache mapping 2 cache mapping 3 Experiments • Task set with 4 tasks • Three different cache mappings of tasks

  34. Experimental Results (1)

  35. Experimental Results (2)

  36. Conclusions • Intra-task Analysis • Extended Timing Schema • PA (Path Abstraction) •  and pruning operations • Inter-task Analysis • Data Flow Analysis • Response Time Equation • Linear Programming Technique

  37. Future Work • Data Cache Analysis • WCET Analysis for Advanced Architectures (Superscalar and VLIW) • I/O (DMA) Timing Analysis

  38. http://archi.snu.ac.kr/symin/ Related Papers • S.-S. Lim et al. “An Accurate Worst Case Timing Analysis for RISC Processors,”IEEE Transactions on Software Engineering, 21(7):593-604, July 1995. • C.-G. Lee et al. “Analysis of Cache-related Preemption Delay in Fixed- priority Preemptive Scheduling,” IEEE Transactions on Computers, 47(6):700-713, June 1998.

More Related