1 / 22

IL2200 - High Level Synthesis

Ahmed Hemani www.it.kth.se/~hemani. IL2200 - High Level Synthesis. High Level Synthesis. WHILE G < K LOOP F := E*(A+B); G := (A+B)*(C+D); END LOOP;. Algorithm. Controller. PLA. Latches. Library. +. -. Constraints Area Time: Clock Period Nr. of clock steps Power.

mirari
Download Presentation

IL2200 - High Level Synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ahmed Hemani www.it.kth.se/~hemani IL2200 - High Level Synthesis

  2. High Level Synthesis WHILE G < K LOOP F := E*(A+B); G := (A+B)*(C+D); END LOOP; Algorithm Controller PLA Latches Library + - Constraints Area Time: Clock Period Nr. of clock steps Power * < Datapath K X < A C B D E Y + * F G

  3. Control & Data Flow Graph K B A D C < + + E * * F G WHILE G < K LOOP F := E*(A+B); G := (A+B)*(C+D); END LOOP; • Set of operations - V • Data dependencies - D V V • Control dependencies - C V V • Nodes and edges have place holders for synthesized information • Compiler like optimisation at source code level

  4. The corner stone of algorithmic synthesis optimisation strategy. Reuse Same hardware resource Operations that reuse a resource are never executed at the same time Operations with potentially different functional requirement assigned to the same resource type In mutually exclusive control branches Assigned to different states in a state machine

  5. Spread across the entire synthesis process Algorithm specifies functional requirement Many units in Library satisfies the requirement Constraints guide the selection Judiciously maximize the reuse potential Adds information like delay, area and power. Allocate type and amount of resource K 16 32 B A D C < 32 +1 +2 E *2 *1 F G For operations +1 and +2 to reuse same adder it is essential we allocate an adder that can serve both of them add32 mult32 +1 *1 +2 *2 + *

  6. Algorithm specifies relative order. The soul of algorithmic synthesis. Tightly coupled to allocation. Time constrained scheduling & area constrained scheduling. Area-time trade-offs. Schedule operations. B A C D 1 E + * + + + + + * * * * * +1 2 +2 *1 3 *2 F G B A 1 E +1 C D 2 B A D C *1 3 1 E +2 +2 +1 F 4 2 *2 *2 *1 G F G

  7. Algorithm does not specify the registers. New registers are architected to hold the values that cross clock step boundaries. Registers are necessary to reuse resources. Optimized using lifetime analysis. Strongly influenced by scheduling. Algorithmic synthesis generates too many registers. Storage Synthesis 1 A B 2 C D E 3 +1 +2 *1 *2 F G X Y

  8. Interconnect elements like multiplexers and busses implement the control flow. Interconnect elements are also instrumental in implementing reusage. For every reused resource an interconnect element is architected. Interconnect Synthesis A B C D E +1 1 X +2 *1 2 Y 3 *2 F G Datapath Datapath after operand interchange X A C B D Y E E A C B D Y X + * + * F G F G

  9. Registers can be reused by doing lifetime analysis of the values they hold. The lifetime of registers Y and G do not overlap RegisterMerging A B C D E +1 1 X +2 *1 2 Y Y 3 *2 F G G Datapath after register merging Datapath after operand interchange E A C B D Y E A C B D G X X + * + * F G F

  10. E A B D C C D +1 1 1 X B A *1 +2 +2 2 2 Y E 3 3 *2 +1 Y X *2 *1 G F G F Area Ports Busses Registers Adders + Multipliers * 2 3 4 Control steps 2 ports 3 regs 5 busses 1 adder 2 multiplier 3 ports 4 regs 6 busses 1 adder 1 multiplier

  11. Binding

  12. FIR Basics – HW Impl. perspective • Two vectors of size k – • x the samples vector and • h the impulse response of the filter – also known as co-efficients • x vector is also known as the delay line – because it preserves the previous k-1 samples – the delayed samples • A new x – x(0) is sampled – every sample period – marked by sample clock • When a new sample arrives, the previous samples are shifted, so that the oldes sample x(k-1) is shifted out

  13. Algorithm or ??? c0 x0 c4 c1 c3 c2 x1 x2 x4 x3 + + + + × × × × ×

  14. c0 x0 c1 c2 c3 c4 x1 x2 x3 x4 C_step 1 Sample Clk System Clk Critical Path Delay Line Adders Multipliers Registers Multiplexors + + + + × × × × ×

  15. Sample Clk System Clk Critical Path Delay Line Adders Multipliers Registers Multiplexors c0 x0 c1 c4 c3 c2 x1 x2 x4 x3 + + + + × × × × ×

  16. c0 x0 . . . . xn-1 cn-1 . . × Multiply Add Accumulate (MAC) +

  17. Sample Clk System Clk Critical Path Delay Line Adders Multipliers Registers Multiplexors c0 x0 c1 c4 c3 c2 x1 x2 x4 x3 + + + + × × × × ×

  18. c0 x0 c1 c3 c2 c4 x1 x4 x2 x3 Sample Clk System Clk Critical Path Delay Line Adders Multipliers Registers Multiplexors + + + + × × × × ×

  19. Symmetric FIR Filter ci = ck-i c0.x0 + c1.x1 + c2.x2 + c3.x3 + c4.x4 c0 = c4, c1 = c3 c0.(x0 + x4) + c1.(x1 + x3) + c2.x2 Roughly reduces the number of multiplication by half

  20. Structure of FIR The top level should be structural VHDL F S M Coeff ROM Delay Line The FIR components: FSM, Coeff ROM, Delay Line and FIR Arithmetic Should be behavioural or behavioural RTL where necessary FIR Arithmetic

  21. Fully Parallel The Delay Line Implemented as shift register x2 x0 x1 x3 x4 The Co-efficients: Hardwired c0 c1 c2 c3 c4 + + + + Adder tree

More Related