1 / 15

A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems

A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems. Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati {kchatha,ranga}@ececs.uc.edu. Organization of Talk. Introduction Overview of Tool Codesign partitioner Pipelined Scheduler

noleta
Download Presentation

A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Tool for Partitioning andPipelined Schedulingof Hardware-SoftwareSystems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati {kchatha,ranga}@ececs.uc.edu

  2. Organization of Talk • Introduction • Overview of Tool • Codesign partitioner • Pipelined Scheduler • Results • Conclusion

  3. Introduction • Motivation: • The throughput of a loop oriented HW-SW • application can be maximized by obtaining a • pipelined implementation. • Objective: • To obtain a pipelined implementation of the • application on the codesign architecture such that: • - Throughput constraint is satisfied • - HW area constraint is satisfied • - Number of pipeline stages is minimized • - Increase in memory requirement is minimized

  4. A HW Co-processor SW Processor B C Shared Memory Local Memory For SW-HW, HW-SW & HW-HW communication. D For SW-SW communication. IntroductionArchitecture and Task Graph S = 225 ns H = 175 ns (8 +, …) S = 200 ns H = 100 ns (4 *, 8 -, …) S = 400 ns H = 150 ns (4 *, 8 +, …) S = 100 ns H = 400 ns (3 *, 3 +, …) 10 Data items per dependence

  5. Some Definitions • A pipelined design is characterized by its initiation interval. • Initiation interval (II) is the time difference between the • start of two consecutive iterations of the steady state. • Given a partitioned task graph there exists a theoretical • lower bound on the II of its pipelined schedule called the • Minimum Initiation Interval (MII). For a directed acyclic • task graph the MII is given by: • MII = max (Sum_hw, Sum_sw) • where Sum_hw is the sum of execution times of tasks bound • to HW and Sum_sw is the sum of execution times of tasks • bound to SW.

  6. HW-SW Codesign Throughput and Area Constraints Task Graph Architecture Partition Design Satisfy throughput and area constraints. Constraint Satisfied ? Unable to Design with Given Constraints NO YES Calculate MII Set II = MII Satisfy throughput constraints, minimize the number of pipeline stages and minimize the increase in memory requirements. Obtain a Pipelined Schedule which executes in II time. Yes Schd found ? Output Successful Design YES NO Increase II NO II > Constraint ? YES

  7. HW-SW Partitioner • Branch and bound algorithm • Initial solution tries to minimize MII • - Suitability of task to be assigned to HW is given by: • - Sort tasks in descending order of their suitabilities. • - Assign tasks to HW and SW alternatively from front and back • of the sorted list so that Sum_hw and Sum_sw remain • balanced. • We also apply heuristics to effectively limit the search space • of the algorithm.

  8. HW-SW PartitionerArea Estimation • Resources required by tasks divided into two types: • 1. Shared - adders, subtractors, multipliers, dividers • 2. Unshared - interconnect and controller • Shared resource area estimated by taking the union of the • shared resources required by all the HW tasks. • Unshared resource area estimated by adding the area • associated with the unshared resources of all the HW tasks. • Total area estimated by taking the sum of area requirements • of shared and unshared resources.

  9. Try to obtain a task schedule which executes in II time. (use list scheduling) Schd. Found ? Yes Success No Select a dependency to retime. (use RECOD Step 1) Retiming Transformation (use RECOD Step 2) Dependency found ? Yes No Failure Pipelined Scheduling

  10. RECOD Step 1: Select a dependency to retime SW A 1. Dependency is an intra loop dependency (ILD). Var = 20 d = 0 d = 0 d = 0 d = 1 HW SW 2. Dependency between tasks bound to heterogeneous processors. B C D E SW HW d = 0 d = 0 d = 1 3. Dependency whose predecessor task belongs to longer constraining path. HW SW SW d = 0 F G H d = 0 d = 0 d = 0 Var = 10 4. Dependency representing the least number of data items transferred. I SW

  11. RECOD Step 2:Partition to minimize increase in memory requirements. Set P A Cost function for the partitioner B C D E Set R F G H Cutset Retiming Transformation I Set S

  12. JPEG Case Study • We specified the JPEG image compression algorithm as task graph with • 12 tasks. • We then obtained pipelined codesign implementations by specifying • different constraints on the II and HW area.

  13. Execution Time • We evaluated the runtime of the tool by invoking it for 50 random task • graphs and searching for optimal HW-SW partitions.

  14. Percentage deviation of initial solution from final • We calculated the percentage deviation in initiation interval of the initial • partition from the final partition. • The average percentage deviation was 8.4%.

  15. Conclusion • The tool can optimize the throughput, area, pipeline stages • and memory requirements of pipelined HW-SW system. • The tool can obtain solutions for task graphs with upto 30 • nodes within a short period of time. • Although it assumes a single SW processor and single HW • coprocessor the technique can be extended to multiple • processor architectures. • The limitation of the tool is its inability to handle large task • graphs (> 30 nodes) in a reasonable amount of time. • A time out option with the branch and bound partitioner can • overcome this limitation.

More Related