Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

Hardware Implementation of AlgorithmsSprzętowa Implementacja AlgorytmówUkłady mnożące, konwolweryMultipliers, convolvers Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

1 0 0 1 X 1 0 1 1 1 0 0 1 1 0 0 1 0 0 0 0 + 1 0 0 1 1 1 0 0 0 1 1 Mnożenie / Multiplication 9 x 11= 99

Parallel Array MultipliersMnożenie równoległe

FPGA, Built-in multiplier DSP48

Sequential Multiplier /Mnożenie sekwencyjne

Wallace Tree Multiplier(with Carry Save Adders) W układach FPGA nie zaleca się stosowania CSA In FPGA the CSA are not recommended

Mnożenie ze znakiem / Multiplication of Sign numbers Znak, Moduł / Sign-Module Standardowe mnożenie liczb dodatnich / Standard unsigned multiplication Znak= Znak1 XOR Znak2 Sign= Sign1 xor Sign2 W kodzie uzupełnień do dwóch Two’s Complement (a1+a2)*(b1+b2)= a1b1+ a1b2+a2b1+a2b2 C. R. Baugh and B. A.Wooley, “A two’s complement parallel array multiplication algorithm,” IEEE Trans. Comput., vol. C-22, pp. 1045–1047, Dec. 1973.

Mnożenie w kodzie uzupełnień do 2 / Two’s complement multiplication

Układ mnożący o zredukowanej szerokości / Reduced-width multiplier

Kompensacja błędu redukcji / Truncation error compensation

Mnożenie przez stały współczynnik / Constant Coefficient MultiplierLook UpTable (LUT) Example: Y= 5*X Address Data 0 0 1 5 2 10 3 15 ...

LUT-based Multiplier Constant Coefficient: CY = CA = CA(0:3) + 24 CA(4:7)

Different ROM sizesInput data width = 6 bits

Heteregenous memory usage Virtex: 161, 321, 4k1, 2k2, 1k4, 5128, 25616Input data and coefficient width= 14

Exchange distributed RAM to BRAM CLB BRAM

Equvalent cost of 1 BRAM Only CLB, scale 1:10 # of BRAM Area [CLB] for different input and coeffitinent width K

MM (Multiplierless Multiplication)Mnożenie bezmnożne • Binary Representation, example B= 14= 11102 • M= AB= (A<<1)+(A<<2)+(A<<3) • Sub-structure Sharing (SS) example B= 27= 110112 • tmp= A + (A<<1) • M= AB= tmp + (tmp<<3) • Canonic Sign Digit (CSD) • set {0, 1, -1} (0 – no operation, 1 – addition, -1 (1) – subtraction) • example: B= 7 = 1112 B= 1001CSD • M=B·A= (A<<2) + (A<<1) + AM= (A<<3)-A

BINARNIE CSDinsert symbol ‘1’ only if the total number of operation is reduced Standard Modified

Applience of different techniques of MM

The MM cost for different coefficients

Filters FIR

Filter FIR (sposób pośredni/ transposed)

FIR 2D

1 1 -1 2 1 -2 1 1 -1 0 2 1 0 4 -8 0 2 1 1 1 1 2 2 1 1 1 1 Examples of 2D FIR Filters Low-Pass Sobel Laplace

8 z-1 In 4 4 4 4 LUT M0 LUT L0 LUT M1 LUT L1 12 12 12 12 12 12 12 12 4 Adder1 Adder0 Adder1 Adder0 4 13 13 13 9 Multiplier 1 Multiplier 2 Adder2 Adder2 4 Adders Block 14 18 18 FIR Filter N=2LUT-based multipliers

FIR, Arytmetyka w innej kolejności(Parallel) Distributed Arithmetic different bits of the input input coefficient

Arytmetyka Rozproszona (Distributed Arithmetic) The same input bit weight (smaller LUT widths)

Filtry FIR z liniową fazą / Linear Phase Filters(symetryczne/ symmetric: h(0)=h(N-1), h(1)=h(N-2), ...)

FPGA, Built-in multiplier DSP48

Example of sub-structure sharing for FIR filters H(z)= 5 + 13z-1 + 5z-2 = 1012 + 11012z-1 + 1012z-2 Example 1: A= 5 = 1012- temporary expression H(z)= A + (1000 + A)z-1 + Az-2 Example 2: A= 1 + z-1 H(z)= 5A + 8z-1 + 5z-2

Materiały dodatkoweThe END

Szybkie mnożenie w układach FPGA 26·(2·a7 ·b + a6 ·b)

Układy mnożące w FPGA (a7 and bi) xor (a6 and bi+1) Przykład: G4 - a7 G3 - bi G2 - a6 G1 - bi+1 F4 – a7 F3 – bi-1 F2 – a6 F1 – bi Fragment of Virtex Configurable Logic Block (CLB)

Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH