410 likes | 565 Views
CUSTOM-FPGA DESIGN AND MAPPING FOR DSP TRANSFORMS. BHARATHWAJ V SANKARA SHUBHIKA TANEJA. OUTLINE. Mathematics of different DSP algorithms Generalization of DSP algorithms Systolic array architecture Logic block architecture & pipelining Mapping different algorithms to hardware Statistics.
E N D
CUSTOM-FPGA DESIGN AND MAPPING FOR DSP TRANSFORMS BHARATHWAJ V SANKARA SHUBHIKA TANEJA
OUTLINE • Mathematics of different DSP algorithms • Generalization of DSP algorithms • Systolic array architecture • Logic block architecture & pipelining • Mapping different algorithms to hardware • Statistics
DSP TRANSFORMS • DFT: X(k) = • DCT: X(k) = • DST: X(k) = • DHT: X(k) = 1/√N, k=0 √2/N, k= 1 to N-1 αk =
MATH BACKGROUND • Let the Transform function be δ(n,k) then X(k)= Where For k=0; X(0) =
For a 4 point transform: X(0) = x(0)δ(0,0) + x(1)δ(1,0) + x(2)δ(2,0) + x(3)δ(3,0) X(1) = x(0)δ(0,1) + x(1)δ(1,1) + x(2)δ(2,1) + x(3)δ(3,1) X(2) = x(0)δ(0,2) + x(1)δ(1,2) + x(2)δ(2,2) + x(3)δ(3,2) X(3) = x(0)δ(0,3) + x(1)δ(1,3) + x(2)δ(2,3) + x(3)δ(3,3)
Consider DFT: X(k) = = = This is simillar to Hartley Transform except that the second term is multiplied by –j coefficient. Xh(k) =
Generalizing the transforms: • X(k) = • Where: cos(2*pi*nk/2N) – jsin(2*pi*nk/2N) - DFT αkcos[pi(2n+1)k/2N] - DCT • δ(n,k) = • sin[pi(n+1)(k+1)/(N+1)] - DST cos(2*pi*nk/N) + sin(2*pi*nk/N) - DHT
LUT ENTRIES δ(n,k) = δ1(n,k) + δ2(n,k)
SYSTOLIC ARRAY ARCHITECTURE δ(n,k)= δ1(n,k)+ δ2(n,k) δ(0,0) δ(0,1) δ(0,2) δ(0,3) δ(1,0) δ(1,1) δ(1,2) δ(1,3) δ(2,0) δ(2,1) δ(2,2) δ(2,3) δ(3,0) δ(3,1) δ(3,2) δ(3,3)
SYSTOLIC ARRAY ARCHITECTURE δ(n,k)= δ1(n,k)+ δ2(n,k) x(0) δ(0,0) δ(0,1) δ(0,2) δ(0,3) x(1) δ(1,0) δ(1,1) δ(1,2) δ(1,3) x(2) δ(2,0) δ(2,1) δ(2,2) δ(2,3) x(3) δ(3,0) δ(3,1) δ(3,2) δ(3,3)
SYSTOLIC ARRAY ARCHITECTURE δ(n,k)= δ1(n,k)+ δ2(n,k) x(0) δ(0,0) δ(0,1) δ(0,2) δ(0,3) x(1) δ(1,0) δ(1,1) δ(1,2) δ(1,3) x(2) δ(2,0) δ(2,1) δ(2,2) δ(2,3) x(3) δ(3,0) δ(3,1) δ(3,2) δ(3,3) X(0) X(1) X(2) X(3)
LOGIC BLOCK OF THE FPGA Mode x(n) × Mode k1 LUT +/- k2 Mode + x(n)/y(n) × From other LB δ(n,k)= δ1(n,k1)+ δ2(n,k2) Mode= SAD selection
PIPELINING OF THE DFG MUX × L + + MUX × MUX
PIPELINING OF THE DFG MUX × L + + MUX × MUX
PIPELINING OF THE DFG D D D MUX × D D L + + D MUX × D MUX D D
PIPELINING OF THE DFG D D D MUX × D D L + + D MUX × D MUX D D Tcritial = max{ TL, TM, TA+TMUX} = TM
WHAT’s NEW IN THIS? • Customized for transforms • C-Code CAD - Systolic array architecture – suited for transforms • Easier routing • Specific Data-Path that supports tranforms • Better performance • Better utilization of resources. intuitive
DFT SOFTWARE CODE • //PSEUDOCODE for (i=0;i<N;i++) { X(i) = 0; X*(i) = 0; for (j=0;j<N;j++) { X(i) = X(i) + x(j)*cos(2*pi*i*j/N); X*(i) = X*(i) + x(j)*sin(2*pi*i*j/N); } }
DFT – Higher level δ1(0,2) δ1(1,2) δ2(0,2) δ2(1,2) δ1(0,3) δ1(1,3) δ2(0,3) δ2(1,3) δ1(2,2) δ1(3,2) δ2(2,2) δ2(3,2) δ1(2,3) δ1(3,3) δ2(2,3) δ2(3,3) δ1(0,0) δ1(1,0) δ2(0,0) δ2(1,0) δ1(0,1) δ1(1,1) δ2(0,1) δ2(1,1) δ1(2,0) δ1(3,0) δ2(2,0) δ2(3,0) δ1(2,1) δ1(3,1) δ2(2,1) δ2(3,1)
DFT – Higher level x(0) δ1(0,2) δ1(1,2) δ2(0,2) δ2(1,2) δ1(0,3) δ1(1,3) δ2(0,3) δ2(1,3) x(1) x(2) δ1(2,2) δ1(3,2) δ2(2,2) δ2(3,2) δ1(2,3) δ1(3,3) δ2(2,3) δ2(3,3) x(3) x(0) δ1(0,0) δ1(1,0) δ2(0,0) δ2(1,0) δ1(0,1) δ1(1,1) δ2(0,1) δ2(1,1) x(1) x(2) δ1(2,0) δ1(3,0) δ2(2,0) δ2(3,0) δ1(2,1) δ1(3,1) δ2(2,1) δ2(3,1) x(3)
DFT – Higher level x(0) δ1(0,2) δ1(1,2) δ2(0,2) δ2(1,2) δ1(0,3) δ1(1,3) δ2(0,3) δ2(1,3) x(1) x(2) δ1(2,2) δ1(3,2) δ2(2,2) δ2(3,2) δ1(2,3) δ1(3,3) δ2(2,3) δ2(3,3) x(3) X(2) X*(2) X(3) X*(3) x(0) δ1(0,0) δ1(1,0) δ2(0,0) δ2(1,0) δ1(0,1) δ1(1,1) δ2(0,1) δ2(1,1) x(1) x(2) δ1(2,0) δ1(3,0) δ2(2,0) δ2(3,0) δ1(2,1) δ1(3,1) δ2(2,1) δ2(3,1) x(3) X(0) X*(0) X(1) X*(1)
DCT/DST SOFTWARE CODE • //PSEUDOCODE for (i=0;i<N;i++) { X(i) = 0; X*(i) = 0; for (j=0;j<N;j++) { X(i) = X(i) + x(j)*cos(pi*(2n+1)k/2N); } }
DCT/DST – HIGHER LEVEL δ(0,0) δ(1,0) δ(0,1) δ(1,1) δ(0,2) δ(1,2) δ(0,3) δ(1,3) δ(2,0) δ(3,0) δ(2,1) δ(3,1) δ(2,2) δ(3,2) δ(2,3) δ(2,3) δ(0,0) δ(1,0) δ(0,1) δ(1,1) δ(0,2) δ(1,2) δ(0,3) δ(1,3) δ(2,0) δ(3,0) δ(2,1) δ(3,1) δ(2,2) δ(3,2) δ(2,3) δ(2,3)
DCT/DST – HIGHER LEVEL x(0) δ(0,0) δ(1,0) δ(0,1) δ(1,1) δ(0,2) δ(1,2) δ(0,3) δ(1,3) x(1) x(2) δ(2,0) δ(3,0) δ(2,1) δ(3,1) δ(2,2) δ(3,2) δ(2,3) δ(2,3) x(3) x(0) δ(0,0) δ(1,0) δ(0,1) δ(1,1) δ(0,2) δ(1,2) δ(0,3) δ(1,3) x(1) x(2) δ(2,0) δ(3,0) δ(2,1) δ(3,1) δ(2,2) δ(3,2) δ(2,3) δ(2,3) x(3)
DCT/DST – HIGHER LEVEL x(0) δ(0,0) δ(1,0) δ(0,1) δ(1,1) δ(0,2) δ(1,2) δ(0,3) δ(1,3) x(1) x(2) δ(2,0) δ(3,0) δ(2,1) δ(3,1) δ(2,2) δ(3,2) δ(2,3) δ(2,3) x(3) X(0) X(1) X(2) X(3) x(0) δ(0,0) δ(1,0) δ(0,1) δ(1,1) δ(0,2) δ(1,2) δ(0,3) δ(1,3) x(1) x(2) δ(2,0) δ(3,0) δ(2,1) δ(3,1) δ(2,2) δ(3,2) δ(2,3) δ(2,3) x(3) X(0) X(1) X(2) X(3)
DCT/DST SOFTWARE CODE • //PSEUDOCODE for (i=0;i<N;i++) { X(i) = 0; X*(i) = 0; for (j=0;j<N;j++) { X(i) = X(i) + x(j)*(cos(2*pi*nk/N)+sin(2*pi*nk/N); } }
DHT – HIGHER LEVEL δ(0,0) δ(0,1) δ(0,2) δ(0,3) δ(1,0) δ(1,1) δ(1,2) δ(1,3) δ(2,0) δ(2,1) δ(2,2) δ(2,3) δ(3,0) δ(3,1) δ(3,2) δ(3,3)
DHT – HIGHER LEVEL x(0) δ(0,0) δ(0,1) δ(0,2) δ(0,3) x(1) δ(1,0) δ(1,1) δ(1,2) δ(1,3) x(2) δ(2,0) δ(2,1) δ(2,2) δ(2,3) x(3) δ(3,0) δ(3,1) δ(3,2) δ(3,3)
DHT – HIGHER LEVEL x(0) δ(0,0) δ(0,1) δ(0,2) δ(0,3) x(1) δ(1,0) δ(1,1) δ(1,2) δ(1,3) x(2) δ(2,0) δ(2,1) δ(2,2) δ(2,3) x(3) δ(3,0) δ(3,1) δ(3,2) δ(3,3) X(0) X(1) X(2) X(3)
LOGIC BLOCK LEVEL DCT/DFT/DST x1 MUX × θ1 L + + θ2 MUX × MUX x2
LOGIC BLOCK LEVEL DCT/DFT/DST x1 MUX × cos(θ1) θ1 L + + θ2 cos(θ1) MUX × MUX x2
LOGIC BLOCK LEVEL x1 DCT/DFT/DST x1 MUX × x1cos(θ1) cos(θ1) θ1 L + + θ2 cos(θ2) MUX x2cos(θ2) × MUX x2 x2
LOGIC BLOCK LEVEL x1 DCT/DFT/DST x1 MUX × x1cos(θ1) x1cos(θ1) cos(θ1) θ1 X+ L + + θ2 cos(θ2) MUX x2cos(θ2) x2cos(θ2) × MUX x2 X+ x2
DHT – LOGIC BLOCK LEVEL x1 x1 MUX × x1cos(θ1) x1cos(θ1) cos(θ1) θ1 X+ L + + 90-θ1 cos(90-θ1) MUX x1cos(90-θ1) × x1cos(90-θ1) MUX x1 X+ x1
FPGA OR DSP • SAMPLE RATE > MHZ? FPGA • CONTEXT SWITCH? DSP/FPGA • FLOATING POINT? DSP • C CODE? DSP FUTURE WORK FPGA