IL2200 - High Level Synthesis

Ahmed Hemani www.it.kth.se/~hemani IL2200 - High Level Synthesis

High Level Synthesis WHILE G < K LOOP F := E*(A+B); G := (A+B)*(C+D); END LOOP; Algorithm Controller PLA Latches Library + - Constraints Area Time: Clock Period Nr. of clock steps Power * < Datapath K X < A C B D E Y + * F G

Control & Data Flow Graph K B A D C < + + E * * F G WHILE G < K LOOP F := E*(A+B); G := (A+B)*(C+D); END LOOP; • Set of operations - V • Data dependencies - D V V • Control dependencies - C V V • Nodes and edges have place holders for synthesized information • Compiler like optimisation at source code level

The corner stone of algorithmic synthesis optimisation strategy. Reuse Same hardware resource Operations that reuse a resource are never executed at the same time Operations with potentially different functional requirement assigned to the same resource type In mutually exclusive control branches Assigned to different states in a state machine

Spread across the entire synthesis process Algorithm specifies functional requirement Many units in Library satisfies the requirement Constraints guide the selection Judiciously maximize the reuse potential Adds information like delay, area and power. Allocate type and amount of resource K 16 32 B A D C < 32 +1 +2 E *2 *1 F G For operations +1 and +2 to reuse same adder it is essential we allocate an adder that can serve both of them add32 mult32 +1 *1 +2 *2 + *

Algorithm specifies relative order. The soul of algorithmic synthesis. Tightly coupled to allocation. Time constrained scheduling & area constrained scheduling. Area-time trade-offs. Schedule operations. B A C D 1 E + * + + + + + * * * * * +1 2 +2 *1 3 *2 F G B A 1 E +1 C D 2 B A D C *1 3 1 E +2 +2 +1 F 4 2 *2 *2 *1 G F G

Algorithm does not specify the registers. New registers are architected to hold the values that cross clock step boundaries. Registers are necessary to reuse resources. Optimized using lifetime analysis. Strongly influenced by scheduling. Algorithmic synthesis generates too many registers. Storage Synthesis 1 A B 2 C D E 3 +1 +2 *1 *2 F G X Y

Interconnect elements like multiplexers and busses implement the control flow. Interconnect elements are also instrumental in implementing reusage. For every reused resource an interconnect element is architected. Interconnect Synthesis A B C D E +1 1 X +2 *1 2 Y 3 *2 F G Datapath Datapath after operand interchange X A C B D Y E E A C B D Y X + * + * F G F G

Registers can be reused by doing lifetime analysis of the values they hold. The lifetime of registers Y and G do not overlap RegisterMerging A B C D E +1 1 X +2 *1 2 Y Y 3 *2 F G G Datapath after register merging Datapath after operand interchange E A C B D Y E A C B D G X X + * + * F G F

E A B D C C D +1 1 1 X B A *1 +2 +2 2 2 Y E 3 3 *2 +1 Y X *2 *1 G F G F Area Ports Busses Registers Adders + Multipliers * 2 3 4 Control steps 2 ports 3 regs 5 busses 1 adder 2 multiplier 3 ports 4 regs 6 busses 1 adder 1 multiplier

Binding

FIR Basics – HW Impl. perspective • Two vectors of size k – • x the samples vector and • h the impulse response of the filter – also known as co-efficients • x vector is also known as the delay line – because it preserves the previous k-1 samples – the delayed samples • A new x – x(0) is sampled – every sample period – marked by sample clock • When a new sample arrives, the previous samples are shifted, so that the oldes sample x(k-1) is shifted out

Algorithm or ??? c0 x0 c4 c1 c3 c2 x1 x2 x4 x3 + + + + × × × × ×

c0 x0 c1 c2 c3 c4 x1 x2 x3 x4 C_step 1 Sample Clk System Clk Critical Path Delay Line Adders Multipliers Registers Multiplexors + + + + × × × × ×

Sample Clk System Clk Critical Path Delay Line Adders Multipliers Registers Multiplexors c0 x0 c1 c4 c3 c2 x1 x2 x4 x3 + + + + × × × × ×

c0 x0 . . . . xn-1 cn-1 . . × Multiply Add Accumulate (MAC) +

Sample Clk System Clk Critical Path Delay Line Adders Multipliers Registers Multiplexors c0 x0 c1 c4 c3 c2 x1 x2 x4 x3 + + + + × × × × ×

c0 x0 c1 c3 c2 c4 x1 x4 x2 x3 Sample Clk System Clk Critical Path Delay Line Adders Multipliers Registers Multiplexors + + + + × × × × ×

Symmetric FIR Filter ci = ck-i c0.x0 + c1.x1 + c2.x2 + c3.x3 + c4.x4 c0 = c4, c1 = c3 c0.(x0 + x4) + c1.(x1 + x3) + c2.x2 Roughly reduces the number of multiplication by half

Structure of FIR The top level should be structural VHDL F S M Coeff ROM Delay Line The FIR components: FSM, Coeff ROM, Delay Line and FIR Arithmetic Should be behavioural or behavioural RTL where necessary FIR Arithmetic

Fully Parallel The Delay Line Implemented as shift register x2 x0 x1 x3 x4 The Co-efficients: Hardwired c0 c1 c2 c3 c4 + + + + Adder tree

IL2200 - High Level Synthesis

IL2200 - High Level Synthesis

Presentation Transcript

High-Level Synthesis an introduction

ECE 565 High-Level Synthesis--Introduction

High Level Synthesis

High-Level Synthesis

High-level Synthesis Scheduling, Allocation, Assignment,

High-Level Synthesis: Creating Custom Circuits from High-Level Code

ENGG3190 Logic Synthesis High Level Synthesis

L10 : Lower Power High Level Synthesis(1)

Validating High-Level Synthesis

L12 : Lower Power High Level Synthesis(3)

Lower Power High Level Synthesis

L11: Lower Power High Level Synthesis(2)

High-Level Synthesis-II

L13 :Lower Power High Level Synthesis(3)

High-Level Synthesis for Reconfigurable Systems

ECE 565 High-Level Synthesis—An Introduction

High-Level Synthesis Algorithms

High-level synthesis

High-Level Synthesis

High-level Synthesis Transformations