- 78 Views
- Uploaded on
- Presentation posted in: General

Design & Co-design of Embedded Systems

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Design & Co-design of Embedded Systems

Distributed System Co-synthesis (2)

Maziar Goudarzi

- Introduction
- Preliminaries
- Hardware/Software Partitioning
- Distributed System Co-Synthesis (part 2)

References:

Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2, Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W. Wolf, Kluwer Academic Publishers, 1997.

W. Wolf, “An architectural co-synthesis algorithm for distributed, embedded computing systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 5, no. 2, pp. 218-229, 1997.

Design & Co-design of Embedded Systems

- Introduction
- An Integer Linear Programming Model
- A Heuristic Algorithm
- On ordinary task graphs
- On an Object-Oriented model

Design & Co-design of Embedded Systems

Co-Synthesis Algorithms:Distributed System Co-Synthesis

Wolf’s Heuristic Algorithm on Ordinary Task Graphs

- As ever, topics of importance:
- System Specification Language/Model
- Target Architecture
- Functionality (Allocation/Scheduling) Quantum
- Allocation Strategy
- Scheduling Strategy
- Cost Estimation
- Performance Estimation
- Algorithm Details

Design & Co-design of Embedded Systems

- Wolf’s Heuristic Algorithm
- System Specification Language/Model
- Algorithm input: single-rate task graph

- Target Architecture
- Heterogeneous multiprocessor architecture

- Allocation
- Primal approach: Performance is the major objective

- Scheduling
- ?

- Functionality Quantum
- Processes in a single-rate task graph

- System Specification Language/Model

Design & Co-design of Embedded Systems

- Wolf’s Heuristic Algorithm (cont’d)
- Performance Estimation
- Component Technology Library
- Run-time of each process on each available PE is supposed to be known

- Cost Estimation
- Component Technology Library
- Total Cost = Si (Cost of PEi) + Sj (Cost of Devicej) + Sk (Cost of Comm. Channelk)

- Algorithm Details

- Performance Estimation

Design & Co-design of Embedded Systems

- Four major steps in co-design
- Partitioning: dividing the spec. into smaller parts (e.g. processes)
- Allocation: assigning each process to a multiprocessor node (PE)
- Scheduling: serializing processes assigned to each PE
- Mapping: selecting a particular component for each PE

- Problem: These steps (especially allocation, scheduling, and mapping) have a circular relationship
- Solution: Break the loop

Design & Co-design of Embedded Systems

- Wolf:
- Give an initial allocation
- Refine it to reduce cost

- Order of satisfying design criteria:
- Satisfy all deadlines
- Minimize PE cost
- Minimize comm. port cost
- Minimize device cost

Design & Co-design of Embedded Systems

- First ignore communication costs. Later, take them into account
- Steps:
1. Create an initial feasible solution, and perform an initial scheduling on it.

- Initial feasible solution: assign each process to a separate PE
2. Reallocate processes to PEs to minimize total PE cost.

- Possibly eliminate PEs from initial feasible solution
3. Reallocate processes again to minimize the amount of communication required between PEs

4. Allocate communication channels

5. Allocate IO devices. (Internal or external to PEs)

- Initial feasible solution: assign each process to a separate PE

Design & Co-design of Embedded Systems

- The most important step: 2. Initial reallocation
- Reason: PE cost is the dominant hardware cost

- Initial reallocation
1. PE cost reduction:

1.1 Scan the PEs, starting with the least-utilized PE.

1.2 Try to reallocate that PE’s processes to other existing PEs

1.3 If no process left on the PE, eliminate it

otherwise replace the PE with a suitable lower-cost one

2. Pair-wise merge

Merge a pair of PEs into a single, more powerful one

3. Load balancing

Design & Co-design of Embedded Systems

- Initial reallocation (cont’d)
- “PE cost reduction” phase tries to reallocate multiple processes at a time
- The above 3 phases are repeated as far as possible

Design & Co-design of Embedded Systems

Design & Co-design of Embedded Systems

- Finds optimal solutions to most of ILP-solved examples
- Finds near-optimal solutions for the remaining examples
- Showed good results on larger examples
- Requires very little run-time
- Due to multiple-move strategy during PE cost minimization phase

Design & Co-design of Embedded Systems

Co-Synthesis Algorithms:Distributed System Co-Synthesis

Wolf’s Heuristic Algorithm for Object-Oriented Models

- Target
- Co-synthesis of a Distributed-System out of an Object-Oriented Specification

- Significance
- OO is a promising approach in designing embedded systems at ESL

Reference:

W. Wolf, “Object-Oriented Co-Synthesis of Distributed Embedded Systems,” ACM Transactions on Design Automation of Electronics Systems, pp. 301-314, 1996

Design & Co-design of Embedded Systems

- Again, our eight topics
- System Specification Language/Model
- Target Architecture
- Functionality (Allocation/Scheduling) Quantum
- Allocation Strategy
- Scheduling Strategy
- Cost Estimation
- Performance Estimation
- Algorithm Details

Design & Co-design of Embedded Systems

Object O2

method m4

variables v10,v20

Object O1

method m1

variables v1,v2

Object O3

method m2

variables v2,v3

method m3

variables v8,v9

- System Specification Model/Language
- An Object-Oriented Specification as input
- Method dataflow graph as model

Design & Co-design of Embedded Systems

- Target Architecture
- Distributed System
- An arbitrary-topology network of PEs

- Distributed System
- Functionality Quantum
- Methods of Objects in an OO Specification
- As far as possible, keeps together all methods of an object
- Partitioning is done during algorithm execution

Design & Co-design of Embedded Systems

- Cost and Performance Estimation
- Pre-specified
- A technology description of available components is input to the algorithm

- Pre-specified
- Allocation, Scheduling, and Algorithm Details
- Much like Wolf’s previous heuristic algorithm
- Includes modifications in order to:
- handle large sets of methods
- consider effects of splitting objects across PEs

Design & Co-design of Embedded Systems

- Allocation, Scheduling, and Algorithm Details
- Initial allocation and scheduling.
Allocate processes to PEs such that all tasks are placed on PEs fast enough to ensure that all deadlines are met, keeping objects together as much as possible

2. Minimize PE cost.

Reallocate processes to PEs to minimize PE cost, splitting objects when necessary.

3. Minimize communication.

Reallocate processes again to minimize inter-PE communication, taking into account traffic generated by splitting objects across PEs

- Initial allocation and scheduling.

Design & Co-design of Embedded Systems

4. Allocate channels.

Allocate communication channels

5. Allocate devices.

either as on-chip devices or external devices on communication channels

- Allocation, … Details (cont’d)

Design & Co-design of Embedded Systems

- Step 1 (initial allocation)
- One PE per object

- Step 2 (minimize PE cost)
- oo_balance_load()
- Tries to redistribute methods to better balance the system load

- PE_replacement()
- Use a cheaper PE without distributing the allocation

- oo_pairwise_merge()
- Tries to eliminate PE by moving its methods to other PEs

- oo_balance_load()
- Step 2 is done repeatedly
- Methods are re-scheduled after each new allocation

Design & Co-design of Embedded Systems

Note :

This operation may cause "Hidden communication”.

Design & Co-design of Embedded Systems

Design & Co-design of Embedded Systems

Reason for highest cpu-time:

Having most methods => scheduling required in each inner loop of step 2

This implementation, had a simple inefficient scheduler.

- Experimental Results
- Algorithm implemented in C++
- Using NIH class library
- 8600 lines of code
- Executed on SGI Indigo workstation

- Algorithm applied to examples from software engineering books on OO design
- Example#objects/methods CPU Time
- cfuge2/30.05
- dye3/152.0
- juice3/40.05
- train5/60.05

- Algorithm implemented in C++

Design & Co-design of Embedded Systems

- Main contribution
- OO specification is an important aid to automatic partitioning
- The specification is naturally divided into two levels of granularity
- Systems is composed of Objects
- Objects are composed of data members and methods

- The specification is naturally divided into two levels of granularity
- The heuristic:
- Preserve the specification’s partitioning as much as possible

- OO specification is an important aid to automatic partitioning

Design & Co-design of Embedded Systems

- Distributed System Co-Synthesis
- A heuristic approach
- Non-OO algorithm
- Customization to OO specifications
- Heuristic: First minimize the PE cost since it is the dominant factor

- A heuristic approach

Design & Co-design of Embedded Systems