1 / 54

Design & Co-design of Embedded Systems

Design & Co-design of Embedded Systems. HW/SW Partitioning Algorithms + Process Scheduling. Maziar Goudarzi. Today Program. Introduction Preliminaries Hardware/Software Partitioning Distributed System Co-Synthesis (Next session). Reference:

Download Presentation

Design & Co-design of Embedded Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design & Co-design of Embedded Systems HW/SW Partitioning Algorithms + Process Scheduling Maziar Goudarzi

  2. Today Program • Introduction • Preliminaries • Hardware/Software Partitioning • Distributed System Co-Synthesis (Next session) Reference: Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2, Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W. Wolf, Kluwer Academic Publishers, 1997. Design & Co-design of Embedded Systems

  3. Topics • Introduction • A Classification • Examples • Vulcan • Cosyma Design & Co-design of Embedded Systems

  4. Introduction to HW/SW Partitioning • The first variety of co-synthesis applications • Definition • A HW/SW partitioning algorithm implements a specification on some sort of multiprocessor architecture • Usually • Multiprocessor architecture = one CPU + some ASICs on CPU bus Design & Co-design of Embedded Systems

  5. Introduction to HW/SW Partitioning (cont’d) • A Terminology • Allocation • Synthesis methods which design the multiprocessor topology along with the PEs and SW architecture • Scheduling • The process of assigning PE (CPU and/or ASICs) time to processes to get executed Design & Co-design of Embedded Systems

  6. Introduction to HW/SW Partitioning (cont’d) • In most partitioning algorithms • Type of CPU is fixed and given • ASICs must be synthesized • What function to implement on each ASIC? • What characteristics should the implementation have? • Are single-rate synthesis problems • CDFG is the starting model Design & Co-design of Embedded Systems

  7. HW/SW Partitioning (cont’d) • Normal use of architectural components • CPU performs less computationally-intensive functions • ASICs used to accelerate core functions • Where to use? • High-performance applications • No CPU is fast enough for the operations • Low-cost application • ASIC accelerators allow use of much smaller, cheaper CPU Design & Co-design of Embedded Systems

  8. A Classification • Criterion: Optimization Strategy • Trade-off between Performance and Cost • Primal Approach • Performance is the primary goal • First, all functionality in ASICs. Progressively move more to CPU to reduce cost. • Dual Approach • Cost is the primary goal • First, all functions in the CPU. Move operations to the ASIC to meet the performance goal. Design & Co-design of Embedded Systems

  9. A Classification (cont’d) • Classification due to optimization strategy (cont’d) • Example co-synthesis systems • Vulcan (Stanford): Primal strategy • Cosyma (Braunschweig, Germany): Dual strategy Design & Co-design of Embedded Systems

  10. Co-Synthesis Algorithms:HW/SW Partitioning HW/SW Partitioning Examples: Vulcan

  11. Partitioning Examples:Vulcan • Gupta, De Micheli, Stanford University • Primal approach 1. All-HW initial implementation. 2. Iteratively move functionality to CPU to reduce cost. • System specification language • HardwareC • Is compiled into a flow graph Design & Co-design of Embedded Systems

  12. x=a; y=b; HardwareC nop 1 1 x=a y=b cond c>d c<=d if (c>d)x=e;else y=f; HardwareC x=e y=f Partitioning Examples:Vulcan (cont’d) Design & Co-design of Embedded Systems

  13. Partitioning Examples:Vulcan (cont’d) • Flow Graph Definition • A variation of a (single-rate) task graph • Nodes • Represent operations • Typically low-level operations: mult, add • Edges • Represent data dependencies • Each contains a Boolean condition under which the edge is traversed Design & Co-design of Embedded Systems

  14. Partitioning Examples:Vulcan (cont’d) • Flow Graph • is executed repeatedly at some rate • can have initiation-time constraints for each node t(vi)+lij t(vj)  t(vi)+uij • can have rate constraints on each node mi  Ri  Mi Design & Co-design of Embedded Systems

  15. Partitioning Examples:Vulcan (cont’d) • Vulcan Co-synthesis Algorithm • Partitioning quantum is a thread • Algorithm divides the flow graph into threads and allocates them • Thread boundary is determined by 1. (always) a non-deterministic delay element, such as wait for an external variable 2. (on choice) other points of flow graph • Target architecture • CPU + Co-processor (multiple ASICs) Design & Co-design of Embedded Systems

  16. Partitioning Examples:Vulcan (cont’d) • Vulcan Co-synthesis algorithm (cont’d) • Allocation • Primal approach • Scheduling • is done by a scheduler on the target CPU • is generated as part of synthesis process • schedules all threads (both HW and SW threads) • cannot be static, due to some threads non-deterministic initiation-time Design & Co-design of Embedded Systems

  17. Partitioning Examples:Vulcan (cont’d) • Vulcan Co-synthesis algorithm (cont’d) • Cost estimation • SW implementation • Code size • relatively straight forward • Data size • Biggest challenge. • Vulcan puts some effort to find bounds for each thread • HW implementation • ? Design & Co-design of Embedded Systems

  18. Partitioning Examples:Vulcan (cont’d) • Vulcan Co-synthesis algorithm (cont’d) • Performance estimation • Both SW- and HW-implementation • From flow-graph, and basic execution times for the operators Design & Co-design of Embedded Systems

  19. Partitioning Examples:Vulcan (cont’d) • Algorithm Details • Partitioning goal • Map each thread to one of two partitions • CPU Set: FS • Co-processor set: FH • Required execution-rate must be met, and total cost minimized Design & Co-design of Embedded Systems

  20. Partitioning Examples:Vulcan (cont’d) • Algorithm Details (cont’d) • Algorithm steps 1. Put all threads in FH set 2. Iteratively do 2.1. Move some operations to FS. 2.1.1. Select a group of operations to move to FS. 2.1.2. Check performance feasibility, by computing worst-case delay through flow-graph given the new thread times 2.1.3. Do the move, if feasible 2.2. Incrementally update the new cost-function to reflect the new partition Design & Co-design of Embedded Systems

  21. Partitioning Examples:Vulcan (cont’d) • Algorithm Details (cont’d) • Vulcan cost function f(w) = c1Sh(FH) - c2Ss(FS) + c3B - c4P + c5|m| • c: weight constants • S(): Size functions • B: Bus utilization (<1) • P: Processor utilization (<1) • m: total number of variables to be transferred between the CPU and the co-processor Design & Co-design of Embedded Systems

  22. Partitioning Examples:Vulcan (cont’d) • Algorithm Details (cont’d) • Complementary notes • A heuristic to minimize communication • Once a thread is moved to FS, its immediate successors are placed in the list for evaluation in the next iteration. • No back-track • Once a thread is assigned to FS, it remains there • Experimental results • considerably faster implementations than all-SW, but much cheaper than all-HW designs are produced Design & Co-design of Embedded Systems

  23. Co-Synthesis Algorithms:HW/SW Partitioning HW/SW Partitioning Examples: Cosyma

  24. Partitioning Examples:Cosyma • Rolf Ernst, et al: Technical University of Braunschweig, Germany • Dual approach 1. All-SW initial implementation. 2. Iteratively move basic blocks to the ASIC accelerator to meet performance objective. • System specification language • Cx • Is compiled into an ESG (Extended Syntax Graph) • ESG is much like a CDFG Design & Co-design of Embedded Systems

  25. Partitioning Examples:Cosyma (cont’d) • Cosyma Co-synthesis Algorithm • Partitioning quantum is a Basic Block • A Basic Block is a branch-free block of program • Target Architecture • CPU + accelerator ASIC(s) • Scheduling • Allocation • Cost Estimation • Performance Estimation • Algorithm Details Design & Co-design of Embedded Systems

  26. Partitioning Examples:Cosyma (cont’d) • Cosyma Co-synthesis Algorithm (cont’d) • Performance Estimation • SW implementation • Done by examining the object code for the basic block generated by a compiler • HW implementation • Assumes one operator per clock cycle. • Creates a list schedule for the DFG of the basic block. • Depth of the list gives the number of clock cycles required. • Communication • Done by data-flow analysis of the adjacent basic blocks. • In Shared-Memory • Proportional to number of variables to be accessed Design & Co-design of Embedded Systems

  27. Partitioning Examples:Cosyma (cont’d) • Algorithm Steps • Change in execution-time caused by moving basic block b from CPU to ASIC: Dc(b) = w( tHW(b)-tSW(b) + tcom(Z) - tcom(ZUb)) x It(b) • w: Constant weight • t(b): Execution time of basic block b • tcom(b): Estimated communication time between CPU and the accelerator ASIC, given a set Z of basic blocks implemented on the ASIC • It(b): Total number of times that b is executed Design & Co-design of Embedded Systems

  28. Partitioning Examples:Cosyma (cont’d) • Experimental Results • By moving only basic-blocks to HW • Typical speedup of only 2x • Reason: • Limited intra-basic-block parallelism • Cure: • Implement several control-flow optimizations to increase parallelism in the basic block, and hence in ASIC • Examples: loop pipelining, speculative branch execution with multiple branch prediction, operator pipelining • Result: • Speedups: 2.7 to 9.7 • CPU times: 35 to 304 seconds on a typical workstation Design & Co-design of Embedded Systems

  29. Summary • What’s co-synthesis • Various keywords used in classification of co-synthesis algorithms • HW/SW Partitioning: One broad category of co-synthesis algorithms • Criteria by which a co-synthesis algorithm is categorized Design & Co-design of Embedded Systems

  30. Processes and operating systems • Scheduling policies: • RMS; • EDF. • Scheduling modeling assumptions. • Interprocess communication. • Power management. Reference (and slides): Wayne Wolf, “Computers as Components: Principles of Embedded Computing System Design,” Chapter 6 (Processes and Operating Systems), MKP, 2001. Overheads for Computers as Components

  31. Metrics • How do we evaluate a scheduling policy: • Ability to satisfy all deadlines. • CPU utilization---percentage of time devoted to useful work. • Scheduling overhead---time required to make scheduling decision. Overheads for Computers as Components

  32. Rate monotonic scheduling • RMS (Liu and Layland): widely-used, analyzable scheduling policy. • Analysis is known as Rate Monotonic Analysis (RMA). Overheads for Computers as Components

  33. RMA model • All process run on single CPU. • Zero context switch time. • No data dependencies between processes. • Process execution time is constant. • Deadline is at end of period. • Highest-priority ready process runs. Overheads for Computers as Components

  34. Process parameters • Ti is computation time of process i; ti is period of process i. period ti Pi computation time Ti Overheads for Computers as Components

  35. Rate-monotonic analysis • Response time: time required to finish process. • Critical instant: scheduling state that gives worst response time. • Critical instant occurs when all higher-priority processes are ready to execute. Overheads for Computers as Components

  36. interfering processes P1 P1 P1 P1 P2 P2 P3 Critical instant P1 P2 P3 critical instant P4 Overheads for Computers as Components

  37. RMS priorities • Optimal (fixed) priority assignment: • shortest-period process gets highest priority; • priority inversely proportional to period; • break ties arbitrarily. • No fixed-priority scheme does better. Overheads for Computers as Components

  38. RMS example P2 period P2 P1 period P1 P1 P1 0 5 10 time Overheads for Computers as Components

  39. RMS CPU utilization • Utilization for n processes is • Si Ti / ti • As number of tasks approaches infinity, maximum utilization approaches 69%. Overheads for Computers as Components

  40. RMS CPU utilization, cont’d. • RMS cannot asymptotically guarantee use 100% of CPU, even with zero context switch overhead. • Must keep idle cycles available to handle worst-case scenario. • However, RMS guarantees all processes will always meet their deadlines. Overheads for Computers as Components

  41. RMS implementation • Efficient implementation: • scan processes; • choose highest-priority active process. Overheads for Computers as Components

  42. Earliest-deadline-first scheduling • EDF: dynamic priority scheduling scheme. • Process closest to its deadline has highest priority. • Requires recalculating processes at every timer interrupt. Overheads for Computers as Components

  43. EDF example P1 P2 Overheads for Computers as Components

  44. EDF analysis • EDF can use 100% of CPU. • But EDF may miss a deadline. Overheads for Computers as Components

  45. EDF implementation • On each timer interrupt: • compute time to deadline; • choose process closest to deadline. • Generally considered too expensive to use in practice. Overheads for Computers as Components

  46. Fixing scheduling problems • What if your set of processes is unschedulable? • Change deadlines in requirements. • Reduce execution times of processes. • Get a faster CPU. Overheads for Computers as Components

  47. Priority inversion • Priority inversion: low-priority process keeps high-priority process from running. • Improper use of system resources can cause scheduling problems: • Low-priority process grabs I/O device. • High-priority device needs I/O device, but can’t get it until low-priority process is done. • Can cause deadlock. Overheads for Computers as Components

  48. Solving priority inversion • Give priorities to system resources. • Have process inherit the priority of a resource that it requests. • Low-priority process inherits priority of device if higher. Overheads for Computers as Components

  49. Data dependencies allow us to improve utilization. Restrict combination of processes that can run simultaneously. P1 and P2 can’t run simultaneously. Data dependencies P1 P2 Overheads for Computers as Components

  50. Context-switching time • Non-zero context switch time can push limits of a tight schedule. • Hard to calculate effects---depends on order of context switches. • In practice, OS context switch overhead is small. Overheads for Computers as Components

More Related