460 likes | 721 Views
Digital Design with FPGAs: Examples and Resource Saving Tips Screen A. Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Oct, 2007. Complexity vs. Resource-Saving. Complexity causes higher FPGA cost.
E N D
Digital Design with FPGAs:Examples and Resource Saving TipsScreen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Oct, 2007 IEEE NSS Refresher Course
Complexity vs. Resource-Saving • Complexity causes higher FPGA cost. • Complexity creates indirect costs such as PCB layout, assembly, power consumption, cooling etc. • Complexity confuses people, including designers. IEEE NSS Refresher Course
Logic Elements in a Nutshell IEEE NSS Refresher Course
D D Q Q ENA ENA CLRN CLRN Logic Elements A B C D LUT4 (16 RAM Cells) Normal Mode: LUT4 + DFF LUT = Look-Up Table CI A LUT3 8 Cells Arithmetic Mode: 2 x LUT3 + DFF LUT3 8 Cells B CO IEEE NSS Refresher Course
D D Q Q ENA ENA CLRN CLRN Synchronous Load and Clear (Cyclone II) LUT 16-bit LUT 8-bit LUT 8-bit IEEE NSS Refresher Course
4-Input NAND, 4-Input NOR, 4-Input NAOR 8 transistors each A B C D A B C D A B C D Y Y Y A B C D A C A B B D Y C Y A D B Y C C D A B C D D A B IEEE NSS Refresher Course
A B B A B Ci B A A Ci Sb Ci Cob Ci A A B A B A B Ci B The Mirror Adder (Weste93) 24-28 transistors IEEE NSS Refresher Course
Cost of Flexibility • The FPGA logic elements (LE) are extremely flexible. • However, the transistor usage of LE is high. • Use dedicated block RAM and multipliers in FPGA when its possible. Multiplier Block RAM 16 Logic Elements IEEE NSS Refresher Course
Clock Domain Changing TDC Inside FPGA Multiple Sampling Q3 QF • Sampling rate: 360 MHz x4 phases = 1.44 GHz. • LSB = 0.69 ns. • Logic elements with critical timing are assigned as shown. c0 c0 QE Q2 c90 QD Q1 c180 Q0 c90 c270 DV T0 T1 Trans. Detection & Encode 4Ch Coarse Time Counter TS Logic elements with non-critical timing are freely placed by the fitter of the compiler. IEEE NSS Refresher Course
QF Q3 Data Out In c0 c0 b1 QE Q2 b0 c90 QD Q1 Shift2 c180 Shift0 Q0 Frame Detection c270 SEL Multiple Sampling Clock Domain Changing Tri-speed Shift Register was3 is0 was0 Trans. Detection is3 Digital Phase Follower • Digital Phase Follower is used as a receiver for serial communication capable of operating with multi-crystal configuration. • No dedicated clock-data-recovery CDR circuitry is needed: suitable for low-cost FPGA IEEE NSS Refresher Course
V1 V1 V3 V3 V2 V2 V4 V4 T1 T1 T2 T2 T3 T3 T4 T4 FPGA ADC Using FPGA AMP & Shaper ADC AMP & Shaper ADC • Analog signals from AMP & Shapers are directly fed to FPGA pins. • FPGA outputs and passive RC network are used to generate ramping reference voltage VREF. • The input voltages and VREF are compared using FPGA differential input receivers. • The times of transitions representing input voltage values are digitized by TDC blocks in FPGA. AMP & Shaper ADC AMP & Shaper ADC FPGA AMP & Shaper TDC AMP & Shaper TDC AMP & Shaper TDC AMP & Shaper TDC VREF R1 R1 C IEEE NSS Refresher Course R2
SA2 SA3 SA4 SP2 SP4 SP3 A B A<B A B A<B A B A<B X3 X4 X2 Sorting for Lowest Four Numbers A SA1 DUMP A B A<B DUMP X1 SA4=X4&(!X3)&!DUMP SA1=X1&!DUMP SP4=X4&X4 + DUMP IEEE NSS Refresher Course
Variations of the Registered Adders IEEE NSS Refresher Course
The Registered Adder A[] A+B D[] Q[] B[] Optional Sync. Load & Clear IEEE NSS Refresher Course
S D[] Q[] The Accumulator, a Special case of Registered Adder A[] A+B D[] Q[] B[] IEEE NSS Refresher Course
A[] == B[] The Modulo X Counter X-1 SCLR Q[] Regular (Mod 16) 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F Mod 6 0,1,2,3,4,5 0,1,2,3,4,5 0,1,2,3,4,5 IEEE NSS Refresher Course
T M-1 A[] == SCLR B[] SLOAD D[] N Q[] The Between Counter PC0: instr0 PC1: instr1 PC2: instr2 PC3: instr3 PC4: instr4 PC5: instr5 PC6: instr6 PC7: instr7 PC8: instr8 PC9: instr9 PCA: instrA PCB: instrB PCC: instrC PCD: instrD 0,1,2,3,4,5,6,7,8,9,A 5,6,7,8,9,A 5,6,7,8,9,A 5,6,7,8,9,A 5,6,7,8,9,A T Between Counter ROM Control Signals 5,6,7,8,9,A,B,C,D,E,F… IEEE NSS Refresher Course
WA0 F0 +) 21139 WA F + WA Decimation With Non-Integer Ratio f S =1 sample/(21ms) f D =128 sample/(1/60Hz) RAM D S WA Q[23..0] f S / f D = 6.2003968 N D 224 = 128 * N * f S / f D N = 21139 IEEE NSS Refresher Course
LUT Counter A Q[23..0] A<B B LED Brightness Variation Counter A Q[23..0] A<B • The LED brightness is varied by changing the output pulse duty-cycle. • Comparator input A is the brightness and B is the clock cycle count. • Look-up table can be added to input A for different brightness variation curve. B IEEE NSS Refresher Course
LED Brightness Exponential Drop if (CO==1) {Q = Q - Q/32;} S(-) SET D Q • Narrow pulse are typically stretched for LED display with fix brightness. • The circuit here provides gradually dim of the LED for better visual effect. A A<B Counter CO B Q IEEE NSS Refresher Course
Tricks of Using RAM IEEE NSS Refresher Course
Pipeline & FIFO (First-In/First-Out) Buffers IEEE NSS Refresher Course
TS Trigger Primitives TDC WA RA FIFO T OUT T PIPELINE DV DV PUSH TN L1 FSM L1 Application of Pipeline and FIFO IEEE NSS Refresher Course
Time Frame FIFO IEEE NSS Refresher Course
TDC With Hit Rate Limiter 0 1 2 3 4 5 6 7 8 9 a b c Reset 1.2us Counter CC[5..2] TDC T1 T0 DV TDC/HRL DVLD CLR CK212 L/S TDC/HRL Hit Rate Limiter (4/256CK212) LD CLRHIT LD TDC/HRL CLRCNT 4hits/256CK212, 4hits/1.2ms 3.3MHz TDC/HRL IEEE NSS Refresher Course
Double Buffers • Double buffers are often used to as interface for micro-processor unit to read out stable data. • The buffer registers are implemented with logic elements, which is not economical if many data words are implemented this way. Data Inputs Inside FPGA MPU Update IEEE NSS Refresher Course
RAM DA AA WEA DB AB WEB QA QB CNT RAM for Serial-Parallel Conversion 16-bit PI +-1 16-bit PO 1-bit 1-bit SI SO • Data widths of two ports can be different. • -1: • Serial-In-Parallel-Out • +1: • Parallel-In-Serial-Out • Data can also be stored for later access. +-1 AA AB IEEE NSS Refresher Course
Tricks on Look-Up Tables • For N input lines, number of memory cells needed is order of 2-to-the-Nth power i.e., O(2N). • Try to organize independent inputs to different lookup tables. • Use LUT for single argument functions: e.g.: exp(x). • Split arguments for higher precision. IEEE NSS Refresher Course
High Precision exp LUT 16 bits A B C A B C ? ROM LUT 216=64K words 232=4G words ROM LUT 28=256 w ROM LUT 28=256 w e (A) e (B) X 1+ X (32-bit Precision) e (A+B+C) = e (A) * e (B) *(1+C) IEEE NSS Refresher Course
RAM Based Histograms IEEE NSS Refresher Course
D Q RAM D WA WE RA Block RAM Based Histogram • The histogram bin number K is used as read address RA. • After clock edge, the contents of the bin is output from RAM. It is added by one, sent back as data input D of the RAM. The bin number K is used as write address. • The bin contents is updated after next clock edge. K DV Y +1 • This circuit works in all FPGA families as long as the same bin is not hit in two consecutive clock cycles. • In some families, the restriction above may not be needed. IEEE NSS Refresher Course
D Q D Q D Q D Q D Q RAM D WA WE RA Q Pipelined Block RAM Based Histogram K4 K3 K2 K1 K0 DV • Pipelined structure allows higher operating frequency yielding higher throughput. • Restriction: Same bin is not hit within 4 cycles. +1 N1 N0+1 IEEE NSS Refresher Course
D Q D Q D Q D Q D Q D Q D Q RAM D WA WE RA Q RAW Hazard Prevention: Data Forwarding Re-hit of a bin is detected. K DV &&== &&== &&== If any bin is to be re-hit before data is written back to RAM, it is forwarded to these registers. +1 IEEE NSS Refresher Course
There is no global reset signal supported in block RAM. To reset the contents of a histogram, one needs to write 0 to N bins which takes N clock cycles. It may be OK for many applications but there are cases where a fast reset is needed. RAM D WA WE RA Q Problem of RAM Contents Fast Reset 0,0,0,0,0…0 0,1,2,3,4…N-1 IEEE NSS Refresher Course
RC CE RC D Q D Q D Q D Q D Q == RESET RAM RAM D WA WE RA D WA WE RA Q Q Histogram with Fast Reset K DV +1 0 IEEE NSS Refresher Course
Topics on Multipliers IEEE NSS Refresher Course
Multipliers • Multipliers now are available in FPGA devices, but intrinsically they consume large amount of resource. • Less-Multipliers (LM): • If a design can use fewer multipliers, use fewer. • Multiplierless (ML): • If a design can work without multipliers, use none. • LM/ML approaches in FPGA and ASIC: • They may not save resource significantly in FPGA. • But they do save resource significantly in ASIC. IEEE NSS Refresher Course
Less-Multipliers: 32-bit Multiplication 32-bit Square A B A B C D A B X X B x D A2 B2 A x D A x B B x C Needs 3 16-bit Multipliers A x C Needs 4 16-bit Multipliers A2 (AB)>>15 Needs 2 16-bit Multipliers if only 32 MSB are kept. IEEE NSS Refresher Course
Multiplierless (ML) Approaches • Canonic signed digit (CSD) and sum of powers of two (SOPOT) representations: • 5xA = 4xA + A, 248xA = 256xA - 8xA • Recursive implementation of finite impulse respond (FIR) filter: • Sliding sum, sinc2, etc. • CORDIC or similar algorithms: • ML FFT, rotators, etc. • Distributed Arithmetic (DA) designs: • Look-up tables. • Single-bit sinc3 FIR decimation filter • In delta-sigma ADC IEEE NSS Refresher Course
Curved Track Fitter IEEE NSS Refresher Course
y0 -4h 4h (z-z0)=-4 (z-z0)=-2 z=z0 (z-z0)=+2 (z-z0)=+4 Track Fitting y5 y4 y3 y2 y1 IEEE NSS Refresher Course
Least Square Fitter c7 d7 e7 c6 d6 e6 c5 d5 e5 c4 d4 e4 c3 d3 e3 c2 d2 e2 c1 d1 e1 y7 y6 y5 y4 y3 y2 y1 X X X • The parameters can be described as inner-products. • Hit coordinates and coefficients are fed simultaneously. • The inner-products can be calculated with multiplier-accumulator structures. S S S IEEE NSS Refresher Course
Multiplier-less (ML) Quasi-Least Square Fitter -8 -32 4 8 x7 x6 x5 x4 x3 x2 x1 +1 -1 -8 4 8 64 y7 y6 y5 y4 y3 y2 y1 << << << • The coefficients are described as “two-bit” numbers, e.g.: • 5=4+1; 7=8-1; 56=64-8; • The multiplication is replaced with two shift & add/sub operations. • There are two clock cycles to fetch a measurement point (i.e., y1, y2, etc.) allowing two shift & add/sub operations S +/- S +/- S +/- IEEE NSS Refresher Course
Inaccuracy Doesn’t Matter, A Lot of Time Multiplier-less Quasi-Least Square FPGA Fitter Least Square Fitter IEEE NSS Refresher Course
This 45 min course ends here.Thanks You are welcome to bring in a USB memory stick to get a copy of this course plus the supplemental materials. IEEE NSS Refresher Course
The End Thank you IEEE NSS Refresher Course