1 / 33

Partitioning in Hardware/Software Co-Design

Partitioning in Hardware/Software Co-Design. Introduction Overview Of A Partitioner Issues Nature of Application Target Architectures Interplay Of Granularity and Estimation Closeness Metrics Cost Function Partitioning Tools Cosyma Lycos Case Study. Overview Of a Partitioner.

rpatrick
Download Presentation

Partitioning in Hardware/Software Co-Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Partitioning in Hardware/SoftwareCo-Design

  2. Introduction • Overview Of A Partitioner • Issues • Nature of Application • Target Architectures • Interplay Of Granularity and Estimation • Closeness Metrics • Cost Function • Partitioning Tools • Cosyma • Lycos • Case Study

  3. Overview Of a Partitioner

  4. Closer Look At Partitioner

  5. Issues Involved during Partitioning Process • Nature of Application • Target Architectures • Interplay Of Granularity and Estimation • Closeness Metrics • Cost Function

  6. Nature Of Application • Computation oriented systems • Workstations, PC’s or scientific parallel computers • Control Dominated Systems reacts to external events • Data-Dominated Systems • Complex transformation or transportation of data • Eg DSP or Router • Mixed Systems • Eg Mobile Phone or Motor Control

  7. Architecture for control dominated systems • Each FSM mapped to a process • Small Variable set – FSM state • Short Program segments – FSM transitions • Explosion of states and transitions – Issue of Code Size • Shared Memory architecture • Optimizations – bit manipulations, few operation per state transition . • E.g.. 8051,Motorolla MC68332 , Siemen’s 80C166

  8. Architecture for Data Oriented Systems • Emphasis on high throughput than short latency deadline • Large data variables – Memory optimization • Periodic behaviour of system parts • Static schedule • Transformations for high concurrency such as loop unrolling • Specialize control,data path and interconnect function units • Priori known address sequences and operations – Memory and address unit specialization • Eg: DSP Applications–ADSP21060,TMS320C80

  9. Mixed Systems • Interconnected data and control dominated functions • Approaches • Heterogeneous systems – Independently controlled communicating specialized components • Computation application without specific specialization potential. • E.g. Printer or Scanner controller • Tailoring of less specialized systems to an application domain – Eg. Minimize power consumption or cost for a required level of performance • E.g.: ARM family , Motorolla Cold Fire family

  10. Modern Embedded Architectures Highly multiplexed data path processors. • ASIPs. • Optimized for speed, performance, power characteristics of the application and can be reused and provide cost. • VLIW processors. • Network of horizontally programmable execution unit. • Commercial programmable DSPs( Harvard Arch). • Separate program and data memories. • Instruction set is tuned to multiply-accumulation Op.

  11. Granularity Level • Coarse Grain Partitioning • Task / Process or Function level • Fine Grain Partitioning • Operator ,Statement or Basic Block Level • Even lower level of Assembly Language not useful – Based upon processor details

  12. Fine Grain Granularity • Becomes important as processor performance and system software increases. • Less obvious , more difficult and time consuming and can have high overheads. • Communication time overhead. • Communication area overhead – May require buffers or memories. • Interlocks. • Change in efficiency of compiler optimizations , pipelines and concurrent units utilizations.

  13. Coarse Grain Granularity • Limits parallelism • Reduces time and error during estimations • Better suited for manual partitioning

  14. Closeness Metrics • Measures the likelihood that two pieces of specification are mapped on to the same system component. • Metrics. • Connectivity. • Measures no. of wires shared between two behaviours. • Communication. • Measures amount of data transferred between two behaviours. • Constrained Communication. • Measures communication metric between those behaviours with given performance constraints.

  15. Common accessors. • Grouping of behaviours(or variables) accessed via subroutine calls and variable read/write by many of same behaviours reduces inter component communication. • Sequential Execution. • If two behaviours are defined sequentially in specification , mapping on to same processor does not affect performance. • Hardware Sharing. • Measures the amount of hardware that two behaviours can share. • Balanced Size. • Achieves a final partition of groups that are roughly balanced in hardware size.Otherwise above metrics lead to a single group.

  16. Structural/Functional Partitioning • Functional Partitioning. • Partitions a functional specification into smaller sub-specifications and synthesizes structure for each. • Isolates a function to one part. • Reduces I/O. • Prevents critical path from crossing parts thus reducing clock period. • Yields simpler hardware , reducing clock period. • Complete control over I/O allowing tradeoff with performance. • Reduces synthesis tool times and memory usage.

  17. Structural Partitioning. • A structure is synthesized for the entire specification and then partitioned. • Size and Delay can be estimated quickly and accurately. • It cannot satisfy both size and I/O constraints. • Placement and Routing can be done. more efficiently. • Not suitable for large systems.

  18. Partitioning Algorithms • Random Mapping • Multistage Clustering • Hierarchical Clustering • Group Migration • Ratio Cut • Simulated Annealing • Genetic Evolution • ILP Formulation

  19. Cosyma • Target Architecture • standard RISC processor core • a fast RAM for program and data with single clock cycle access time • an automatically generated application specific coprocessor. • Peripheral units must be inserted by the designer. • Processor and coprocessor communicate   via shared memory in mutual exclusion

  20. Granularity • Partitioning works at the basic block level. • Since communication between basic blocks of a process is implicit  , partitioning requires communication analysis. • Simulate   on an RT-level  model of the target processor to obtain profiling and software timing information

  21. Hardware/Software Partitioning • Input to partitioning are the ESG with profiling (or control flow analysis) information, the CDR-file and synthesis directives which include channel mapping directives, partitioning directives, and component selection. • Starts with an all software solution and tries to extract hardware components iteratively until all timing constraints are met. • The partitioning goals are • meet real-time constraints • minimize hardware costs • minimize the CAD system response time

  22. Algorithm & Cost function • It uses Simulated Annealing, a stochastic optimization algorithm. • The total (estimated) costs of a single basic block b - assumed that it is moved from software to hardware - amounts to :

  23. Continued…. • tsw(b)is estimated with a local source code timing estimation based on simulation data. • thw(b) is estimated with a list scheduler • tcom(Z U b)is estimated by data flow analysis

  24. LYCOS • Supports an easy inclusion of new design tools and algorithms and new design methods. • It is built as a suite of tools centered around an implementation independent model of computation called Quenya, based upon communicating CDFGs.

  25. LYCOS Partitioning Tool • Input Specification is in form of CDFG. • Granularity is chosen by the user interactively. • Different processor architectures whose technology files are present can be selected. • Dedicated hardware units are selected by loading the hardware library file which contains area,delay,latency,provided operations,storage capabilities etc.

  26. Software execution time is estimated using CDFG and selected processor technology file. • Hardware execution time is estimated using a dynamic list based scheduling algorithm. • Partitioning is done using any of the selected algorithms. • Allows better design space exploration.

More Related