1 / 45

Time-Predictable Execution of Embedded Software on Multi-core Platforms

Time-Predictable Execution of Embedded Software on Multi-core Platforms. Sudipta Chattopadhyay under the guidance of A/P Abhik Roychoudhury. Embedded Systems. Real-time Constraints. Hard real-time. Embedded system. Soft real-time. Timing Analysis .

Download Presentation

Time-Predictable Execution of Embedded Software on Multi-core Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time-Predictable Execution of Embedded Software on Multi-core Platforms SudiptaChattopadhyay under the guidance of A/P AbhikRoychoudhury

  2. Embedded Systems

  3. Real-time Constraints Hard real-time Embedded system Soft real-time

  4. Timing Analysis • Hard real time systems require absolute timing guarantees • System level analysis • Single task analysis • Worst case execution time (WCET) analysis • An upper bound on execution time for all possible inputs • Sound over-approximation is obtained by static analysis

  5. WCET Analysis WCET of basic blocks Infeasible path constraints Program WCET bound Micro-architectural modeling Loop bound Control flow graph constraints Path analysis

  6. Architecture Core 1 Core n L1 cache L1 cache Shared bus Resource sharing Shared L2 cache Memory

  7. Overview Instr. accesses Data accesses Shared cache + shared bus A multi-core WCET tool Shared cache Core 1 Core n L1 instruction cache L1 data cache Unified cache Processor L1 cache L1 cache L2 unified cache Dissertation work (Time-predictable execution in multi-core) Shared bus Resource sharing Bus Shared L2 cache Conflicts with different instruction and data memory blocks Main Memory Cache related preemption delay analysis Shared scratchpad allocation Coherence miss modeling Memory

  8. Micro-architectural Modeling branch predictor shared cache cache pipeline shared bus Single Core Multi Core

  9. Comparison

  10. Imprecision in Abstract Interpretation p1 p2 young a b young b x Cache state = C2 Cache state = C1 Abstract cache set Abstract cache set Joined Cache state = C3 Joined cache state b Path p1 or path p2? Joined cache state loses information about path p1 and p2

  11. Model Checking alone ? • A path sensitive search • Path sensitive search is expensive – path explosion • Worse, combined with possible cache states p1 p2 Cache state = C2 Cache state = C1

  12. Model Checking alone ? • A path-sensitive search • Path sensitive search is expensive – path explosion • Worse, combined with possible cache states Abstract LRU cache set p1 p2 a b young young b x b young a young x b Abstract LRU cache set Abstract LRU cache set State Explosion

  13. Cache analysis WCET of basic blocks All checked Cache analysis by abstract interpretation Pipeline analysis Analysis outcome Infeasible path constraints IPET Program Refine by model checker Branch predictor modeling Loop bound Timeout Micro architectural modeling constraints Refinement by model checker can be terminated at any point Model checker refinement steps are inherently parallel Path analysis Each model checker refinement step checks light assertion property

  14. Refinement (Inter-core) m start Conflicting task Task x < y m1 m1 Infeasible x == y m2 m2 young ≠m m ≠m m exit cache Cache hit Cache miss Spurious

  15. Refinement (Inter-core) start m Conflicting task Task x < y C_m++ m1 Increment conflict m1 Verified Infeasible x == y m2 C_m++ m2 Increment conflict young m m m exit cache assert (C_m <= 1) A Cache Hit

  16. Refinement (Why it works?) m x < y Increment conflict C_m++ m’ Conflict to m m’ Path 2 x == y m Does not affect the value of C_m assert (C_m <= 0) m Cache miss Property

  17. Experimental Setup (Chronos Toolkit) GCC simplescalar C source Binary code CFG Micro architectural modeling Flow constraints cache pipeline Branch prediction ILP WCET CBMC Micro-architectural constraints C bounded model checking

  18. Experimental Result

  19. Experimental Result WCET Direct-mapped, 256 bytes L1 cache L1 cache Average time = 70 secs Shared L2 cache 4-way associative, 8 KB

  20. Extension Using Symbolic Execution unknown x < y Conflicting task x < y x ≥ y x < y C_m++ x = y x = y m1 Increment conflict m1 NO x == y m2 constraint solver C_m++ Increment conflict m2 x < y ˄ x = y satisfied assert (C_m <= 1) assert (C_m <= 1) abort

  21. Extension Using KLEE GCC simplescalar C source Binary code CFG Micro architectural modeling Flow constraints cache pipeline Branch prediction ILP WCET CBMC/KLEE Micro-architectural constraints

  22. A Generic Framework • Three different architectural/application settings High priority Low priority Task in Core 1 Task in Core 2 Cache conflict Cache conflict Cache conflict L1 cache cache cache L1 cache Intra task (WCET in single core) Inter task (Cache Related Preemption Delay analysis) Shared L2 cache Inter core (WCET in multi-core)

  23. Micro-architectural Modeling branch predictor shared cache cache pipeline shared bus Single Core Multi Core

  24. Task-level interference T1 T3 Tasks T2 T2 Core 1 Core n T1 L1 cache L1 cache Shared bus T2 Timeline Shared L2 cache T3 T1 T3 Task interference graph

  25. Shared Cache + TDMA Shared Bus Task graphs Time Division Multiple Access (TDMA) T1 T3 T1 T3 Core 1 Core 2 Core 1 slot T2 T4 L1 cache L1 cache Shared bus Core 2 slot Bus access L2 miss due to T2 T4 Shared L2 cache T2 Disjoint lifetime Core 1 slot WAIT Bus access T4 T1 T2 Core 2 slot T3 T4

  26. Overview of the framework L1 cache analysis L1 cache analysis Task interference monotonically decreases Filter Filter L2 cache analysis L2 cache analysis WCRT computation Bus aware analysis L2 conflict analysis Initial interference Yes Interference changes ? Estimated WCRT No

  27. Evaluation (2-core) One core runs statemate another core runs the program under evaluation

  28. Evaluation (4-core) Either runs (edn, adpcm, compress, statemate) or runs (matmult, fir, jfdcint, statemate) in 4 different cores

  29. Micro-architectural Modeling branch predictor shared cache Interactions cache pipeline shared bus Single Core Multi Core

  30. Timing Anomaly (shared Cache) hit miss miss miss hit hit miss hit miss hit miss hit miss hit miss hit May not be the worst case path

  31. Baseline Abstraction – Timing Interval • Representing each pipeline stage as a timing interval End = Start + cache miss latency interval start [1,3] finish [3,7] [4,10] latency EX WB R1 := R2 + 5 IF ID CM Structural dependency CM IF ID EX WB EX WB CM IF ID R5 := R1 * R7 IF ID EX WB CM Contention IF ID EX WB CM R3 := R5 * 5 A fixed-point analysis derives the timing of each stage as an interval

  32. TDMA Shared Bus Analysis • Time Division Multiple Access (TDMA) • Offset abstraction Core 0 Core 1 Core 0 Core 1 Core 0 Core 1 Core 0 Core 1 delay = 0 offset delay offset round round T’ (core 0) T (core 1)

  33. Loop Construct EX WB previous iteration IF ID CM CM IF ID EX WB EX WB CM current iteration IF ID IF ID EX WB CM How do we define bus context? Property: If the bus offsets of the cross-iteration edges do not change, WCET of the loop iteration cannot change

  34. Loop Construct Ci = bus context of the loop body at i-th iteration C1 C2 C3 Bus context flow graph C4 C5 C5 C3 Property: If Ci Cj, then Ci+k  Cj+k for any k > 0

  35. Loop Construct WCET of basic blocks Bus context flow graph C1 Infeasible path constraints C2 Program loop bound ILP solver Micro-architectural modeling C3 Loop bound Control flow graph Compute WCET for each bus context C4 E(C1) = number of times context C1 is executed Generate linear constraints: E(C1) + E(C2) + E(C3) + E(C4) ≤ loop bound E(C1) ≥ E(C2) constraints ILP = Integer Linear Programming Path analysis

  36. Branch prediction + Cache Cache conflict Cache content m Branch location JOIN m Maximum number of speculated instructions m’ Cache content Unclear cache access

  37. Experimental Setup (Chronos Toolkit) GCC simplescalar C source Binary code CFG Micro architectural modeling Flow constraints Private cache pipeline Branch prediction ILP WCET Shared cache Shared bus Micro-architectural constraints

  38. Evaluation (cache + pipeline) Core 1 Imprecision of shared cache analysis Core 1 Core 2 Core 2 Horizontally partition Vertically partition jfdctint statemate

  39. Evaluation (Cache + pipeline + Speculation) Imprecision of modeling speculation

  40. Evaluation (Bus + pipeline) Imprecision of shared bus analysis Imprecision of path analysis

  41. Recap PE-0 PE-1 PE-N …… c Shared cache + shared bus A multi-core WCET tool Shared cache Low priority task High priority task Task Core 1 Cache conflict Core n Unified cache SPM-0 SPM-1 SPM-N Core 1 Core n L1 data cache L1 data cache Fast on-chip communication media Coherence miss traffic Dissertation work (Time-predictable execution in multi-core) External Memory Interface Stale data items Shared bus L1 cache L1 cache Shared L2 cache Shared off-chip data bus Cache related preemption delay analysis Shared L2 cache Shared scratchpad allocation Coherence miss modeling Off-chip memory Memory

  42. Perspective Time-predictable execution in single-core Resource sharing (cache and bus) Data sharing (cache coherence) Time-predictable execution in multi-core Testing Static analysis Customized hardware Shared cache Shared bus Cache coherence Shared scratchpad ARM Cortex A9 MPCore Samsung Exynos Nvidia Tegra II (smart phones) Time Division Multiple Access Aethreal Network-on-chip Sony PSP IBM Cell

  43. Perspective Functionality Verification Quantitative Verification Concrete domain Concrete domain Abstract domain in abstract Interpretation (AI) Abstraction Anytime Verification of Quantitative properties SLAM (Microsoft) BLAST (UC Berkley) Property AI May be spurious MAGIC (CMU) Verifier Spurious counter example Generate Quantitative property Refinement Path-sensitive Verification Verified Abstraction refinement

  44. Future Work Static performance analysis + testing Symbolic Execution x < y Performance testing x < y x ≥ y x < y x = y x = y Mobile devices m1 x == y Energy analysis of software x < y ˄ x ≠ y m2 Input abort Battery life Energy-aware software testing (Quantitative property e.g. cache conflict) assert (C_m <= 1)

  45. Thank You My sincere thanks to all the Examiners and especially the anonymous Examiner 1 for his comment on symbolic execution

More Related