National Sun Yat-sen University Embedded System Laboratory

National Sun Yat-sen University Embedded System Laboratory Source-Level Timing Annotation for Fast and Accurate TLM Computation Model Generation Presenter: Zong-Ze Huang Cite count: 27 Kai-Li Lin, Chen-Kang Lo, Ren-Song Tsay Design Automation Conference (ASP-DAC), 2010 15th Asia and South Pacific

Abstract • This paper proposes a source-level timing annotation method for generation of accurate transaction level models for software computation modules. While Transaction Level Modeling (TLM) approach is widely adopted now for system modeling and simulation speed improvement, timing estimation accuracy often is compromised. To have reliable and accurate estimation results at system level, we propose a timing annotation method for accurate TLM computation model generation considering processor architecture with pipeline and cache structures, which are challenging but critical to accurate timing estimation. • The experiments show that our results are within 2% of cycle accurate results and the approach is three orders faster than conventional ISS approaches.

What is the Problem • Current methods are not accuracy • Proposal methods • Timing annotation method for accurate TLM computation model • Proper partitioning of the source program. • Critical timing factors must be consider.

Related work Abstract pipeline model to track pipeline status [2] Dynamic simulation approaches at run-time to capture execution time [8],[9],[15] Find Worst-Case Execution Time [4-7] TLM computation model [1] Source-level timing annotation Take each line of source program as an estimation unit [13] Timing-annotated task to estimate execution time, which is mostly the real case Conventional ISS simulation methods are too slow Apply a statistical model to estimate basic block execution time [14] Analyzes optimized target assembly code [16] A specific algorithm Branch prediction and cache accesses analyzes Source-Level Timing Annotation for Fast and Accurate TLM Computation Model Generation

Bad estimation unit chosen • A larger estimation unit is used • Inaccurate reason • Contain several execution paths. • Different input data and array size has different execution paths. • Each source program line is taken as an estimation unit and a fixed delay time. • Inaccurate reason • Pipeline may reduce execution time due to parallel pipeline execution. • Compiler optimization .

Ideal estimation unit • Ideal estimation unit should contain : • Estimation unit cannot include multiple execution paths. • Compiler optimization. • Hardware pipeline architecture between these lines must be considered. • Boundary effect must be considered. • Cache access overhead must be considered.

Timing Annotation Flow • Consists of two major parts: Source program is partitioned into basic block CFG is generated to represent the structure of this program 1.Basic Block Analysis Target cross-compiler translates the source program into target binary codes Calculation cycle of each basic block 2.Timing Estimation Consider branch prediction and calculates a correction number for each edge between two basic blocks Cache access time adjustment

Basic block • Definition • A sequence of consecutive statement that control enters at the beginning and leave at the end possibility of branching. • The concept of basic block used in compiler optimization. • Compiler optimization cannot break program structure. Basic block

Control Flow Graph (CFG) • Source program is analyzed and represented by a Control Flow Graph • Each node of the graph is a basic block. • Each directed edge implies an execution order of the corresponding from-and-to basic block. • Program execution time is estimated by summing up each basic block’s time. • Statically calculate execution time of the basic block first. • Dynamically correct basic block’s execution time according to run-time information later.

Proposal on this paper • Source program with timing annotation and control flow graph with cycle count

Basic Block Cycle Calculation • Follow the abstract pipeline model to track pipeline status. • After the cycle count is calculated, we map each basic block with annotated cycle count to the corresponding CFG node. Basic Block 1 Basic Block 3 Basic Block 2 Cycle time Pipeline stage ……. Instruction index

Boundary Effect Correction • Focus on branch prediction uncertainty and pipeline execution between two basic block. • Correction factor on this edge is 15 – 19 = -4 Basic Block 2 Basic Block 1 + = 19 cycle Successful prediction = 15 cycle

Cache Access Adjusting • I-Cache • Analyze statically which I-Cache blocks are used by which basic blocks. • Cache information is updated after I-Cache block access. • Cache miss occurs, it return a miss penalty and added to the block’s execution time. • D-Cache • An average cycle number from statistical analysis is to model the delay of each data access.

Annotation Algorithm • Timing Annotation in Basic Blocks (TAB) • TAB complexity is O(|E|+|V|) Boundary effect correction factors Calculates the basic block execution time and the I-Cache blocks used in this node

Before the Experiment • Accuracy • Overhead • Compare with other methods

Simulation speed comparison • The experimental results show the overhead by TAB can be ignored.

Accuracy comparison • The result from CA-ISS used as the golden reference. • TAB has less than 2% error rate.

Conclusion • Proposed and implemented a timing annotation approach for accurate and efficient TLM computation model generation. • Experimental results prove that the approach not only accurate but also not effect simulation speed seriously. • My Comment • This paper help me known that source level annotation flow. • How many critical factors must be consider about source level annotation.

National Sun Yat-sen University Embedded System Laboratory