1 / 88

Parallel Programming and Timing Analysis on Embedded Multicores

Parallel Programming and Timing Analysis on Embedded Multicores. Eugene Yip The University of Auckland Supervisors: Advisor: Dr. Partha Roop Dr . Alain Girault Dr. Morteza Biglari-Abhari. Outline. Introduction ForeC language Timing analysis Results Conclusions. Introduction.

garran
Download Presentation

Parallel Programming and Timing Analysis on Embedded Multicores

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Programmingand Timing Analysis on Embedded Multicores Eugene Yip The University of Auckland Supervisors: Advisor: Dr.ParthaRoopDr. Alain Girault Dr.MortezaBiglari-Abhari

  2. Outline • Introduction • ForeC language • Timing analysis • Results • Conclusions

  3. Introduction • Safety-critical systems: • Performs specific tasks. • Behave correctly at all times. • Compliance to strict safety standards. [IEC 61508, DO 178] • Time-predictability useful in real-time designs. [Paolieri et al 2011] Towards Functional-Safe Timing-Dependable Real-Time Architectures.

  4. Introduction • Safety-critical systems: • Shift from single-core to multicore processors. • Better power and execution performance. Core n Core 0 Shared System bus Resource Resource Shared Shared [Blake et al 2009] A Survey of Multicore Processors. [Cullmann et al 2010] Predictability Considerations in the Design of Multi-Core Embedded Systems.

  5. Introduction • Parallel programming: • From super computers to mainstream computers. • Threaded programming model. • Frameworks designed for systems without resource constraints or safety-concerns. • Improving average-case performance (flops), not time-predictability.

  6. Introduction • Parallel programming: • Programmer responsible for shared resources. • Concurrency errors: • Deadlock • Race condition • Atomic violation • Order violation • Non-deterministic thread interleaving. • Determinism essential for understanding and debugging. [McDowell et al 1989] Debugging Concurrent Programs.

  7. Introduction • Synchronous languages • Deterministic concurrency. • Based on the synchrony hypothesis. • Threads execute in lock-step to a global clock. • Concurrency is logical. Typically compiled away. Inputs 1 2 3 4 Global ticks Outputs [Benveniste et al 2003] The Synchronous Languages 12 Years Later.

  8. Introduction • Synchronous languages Defined by the timing requirementsof the system Must validate: max(Reaction time) < min(Time of each tick) Time between each tick 1 2 3 4 Physical time Reaction time [Benveniste et al 2003] The Synchronous Languages 12 Years Later.

  9. Introduction • Synchronous languages • Esterel • Lustre • Signal • Synchronous extensions to C: • PRET-C • ReactiveC with shared variables. • Synchronous C (SC – see Michael’s talk) • Esterel C Language Concurrent threads scheduled sequentially in a cooperatively manner. Atomic execution of threads which ensures thread-safe access to shared variables. [Roop et al 2009] Tight WCRT Analysis of Synchronous C Programs. [Boussinot 1993] Reactive Shared Variables Based Systems. [Hanxleden et al 2009] SyncCharts in C - A Proposal for Light-Weight, Deterministic Concurrency. [Lavagno et al 1999] ECL: A Specification Environment for System-Level Design.

  10. Introduction • Synchronous languages • Esterel • Lustre • Signal • Synchronous extensions to C: • PRET-C • ReactiveC with shared variables. • Synchronous C (SC – see Michael’s talk) • Esterel C Language Writes to shared variables are delayed to the end of the global tick. At the end of the global tick, the writes are combined and assigned to the shared variable. Associative and commutative “combine function”. [Roop et al 2009] Tight WCRT Analysis of Synchronous C Programs. [Boussinot 1993] Reactive Shared Variables Based Systems. [Hanxleden et al 2009] SyncCharts in C - A Proposal for Light-Weight, Deterministic Concurrency. [Lavagno et al 1999] ECL: A Specification Environment for System-Level Design.

  11. Outline • Introduction • ForeC language • Timing analysis • Results • Conclusions

  12. ForeC language “Foresee” • Deterministic parallel programming of embedded multicores. • C with a minimal set of synchronous constructs for deterministic parallelism. • Fork/Join parallelism (explicit). • Shared memory model. • Deterministic thread communication using shared variables.

  13. ForeC language • Constructs: • par(t1, …, tn) • Fork threads t1 to tn to execute in parallel, in any order. • Parent thread is suspended, until all child threads terminate. • thread t1(...) {b} • Thread definition. • pause • Synchronisation barrier. • When a thread pauses, it completes a local tick. • When all threads pause, the program completes a global tick.

  14. ForeC language • Constructs: • abort {b} when (c) • Preempts the body b when the condition c is true. The condition is checked before executing the body. • weak abort {b} when (c) • Preemptsthe body when the body reaches a pause and the condition c is true. The condition is checked before executing the body.

  15. ForeC language • Variable type qualifiers: • input • Variable gets its value from the environment. • output • Variable emits its value to the environment.

  16. ForeC language • Variable type qualifiers: • shared • Variable which may be accessed by multiple threads. • At the start of a thread’s local tick, it creates local copies of shared variables that it accesses. • During the thread’s local tick, it modifies its local copy (isolation). • At the end of the global tick, copies that have been modified are combined using a commutative and associative function (combine function). • The combined result is committed back to the original shared variable.

  17. ForeC language sharedint x = 0; void main(void) { x = 1; par(t0(), t1()); x = x - 1; } thread t0(void) { x = 10; x = x + 1; pause; x = x + 1; } thread t1(void) { x = x * 2 pause; x = x * 2; }

  18. ForeC language sharedint x = 0; void main(void) { x = 1; par(t0(), t1()); x = x - 1; } thread t0(void) { x = 10; x = x + 1; pause; x = x + 1; } thread t1(void) { x = x * 2 pause; x = x * 2; } Concurrent Control-Flow Graph

  19. ForeC language • Sequential control-flow along a single path. • Parallel control-flow along branches from a fork node. • Global tick ends when all threads pause or terminate.

  20. ForeC language State of the shared variables Global: x 0

  21. ForeC language State of the shared variables Global: x Thread main creates a local copy of x. 0

  22. ForeC language State of the shared variables Global: x Thread main creates a local copy of x. 0 main 0

  23. ForeC language State of the shared variables Global: x 0 main 1

  24. ForeC language State of the shared variables Global: x 0 Threads t0 and t1 take over main’s copy of the shared variable x. main 1

  25. ForeC language State of the shared variables Global: x 0 Threads t0 and t1 take over main’s copy of the shared variable x. t0 t1 1 1

  26. ForeC language State of the shared variables Global: x 0 t0 t1 10 1

  27. ForeC language State of the shared variables Global: x 0 t0 t1 11 1

  28. ForeC language State of the shared variables Global: x 0 t0 t1 11 2

  29. ForeC language State of the shared variables Global: x 0 t0 t1 11 2 • Global tick is reached. • Combine the copies of x together using a (programmer defined) associative and commutative function. • Assume the combine function for x implements summation.

  30. ForeC language State of the shared variables Global: x 0 t0 t1 11 2 • Assign the combined value back to x.

  31. ForeC language State of the shared variables Global: x 13 • Assign the combined value back to x.

  32. ForeC language State of the shared variables Global: x 13 t0 t1 13 13 • Next global tick. • Active threads create a copy of x.

  33. ForeC language State of the shared variables Global: x 13 t0 t1 14 13

  34. ForeC language State of the shared variables Global: x 13 t0 t1 14 26

  35. ForeC language State of the shared variables Global: x 13 t0 t1 14 26 • Threads t0 and t1 terminate and join back to the parent thread main. • Local copies of x are combined into a single copy and given back to the parent thread main.

  36. ForeC language State of the shared variables Global: x 13 main 40 • Threads t0 and t1 terminate and join back to the parent thread main. • Local copies of x are combined into a single copy and given back to the parent thread main.

  37. ForeC language State of the shared variables Global: x 13 main 39

  38. ForeC language State of the shared variables Global: x 39

  39. ForeC language • Shared variables. • Threads modify local copies of shared variables. • Isolates thread execution behaviour. • Order/interleaving of thread execution has no impact on the final result. • Prevents concurrency errors. • Associative and commutative combine functions. • Order of combining doesn’t matter.

  40. Scheduling • Light-weight static scheduling. • Take advantage of multicore performance while delivering time-predictability. • Thread allocation and scheduling order on each core decided at compile time by the programmer. • Cooperative (non-preemptive) scheduling. • Fork/join semantics and notion of a global tick is preserved via synchronisation.

  41. Scheduling • One core to perform housekeeping tasks at the end of the global tick. • Combining shared variables. • Emitting outputs. • Sampling inputs and start the next global tick.

  42. Outline • Introduction • ForeC language • Timing analysis • Results • Conclusions

  43. Timing analysis • Compute the program’s worst-case reaction time (WCRT). Defined by the timing requirementsof the system Must validate: max(Reaction time) < min(Time of each tick) Time between each tick 1 2 3 4 Physical time Reaction time

  44. Timing analysis Existing approaches for synchronous programs. • Integer Linear Programming (ILP) • Max-Plus • Model Checking

  45. Timing analysis Existing approaches for synchronous programs. • Integer Linear Programming (ILP) • Execution time of the program described as a set of integer equations. • Solving ILP is known to be NP-hard. • Max-Plus • Model Checking [Ju et al 2010] Timing Analysis of Esterel Programs on General-Purpose Multiprocessors.

  46. Timing analysis Existing approaches for synchronous programs. • Integer Linear Programming (ILP) • Max-Plus • Compute the WCRT of each thread. • Using the thread WCRTs, the WCRT of the program is computed. • Assumes there is a global tick where all threads execute their worst-case. • Model Checking

  47. Timing analysis Existing approaches for synchronous programs. • Integer Linear Programming (ILP) • Max-Plus • Model Checking • Eliminate false paths by explicit path exploration (reachability over the program’s CFG). • Binary search: Check the WCRT is less than “x”. • State-space explosion problem. • Trades-off analysis time for precision. • Provides execution trace for the WCRT.

  48. Timing analysis • Our approach using Reachability: • Same benefits as model checking, but a binary search of the WCRT is not required. • To handle state-space explosion: • Reduce the program’s CCFG before analysis. Reconstruct the program’s CCFG Program binary (annotated) Find the global ticks (Reachability) WCRT

  49. Timing analysis • Programs will execute on the following multicore: Core 0 Core n Instruction memory Instruction memory Data memory Data memory TDMA Shared Bus Global memory

  50. Timing analysis • Computing the execution time: • Overlapping of thread execution time from parallelism and inter-core synchronizations. • Scheduling overheads. • Variable delay in accessing the shared bus.

More Related