Digital Fields Board (DFB) FPGA. Ken Stevens University of Colorado  LASP. Presentation Overview. Design Context Requirements Changes/Trades since PDR Modulelevel Design Examples FPGA Verification and Validation Strategy Current Status. DFB Block Diagram.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Design Context
Requirements
Changes/Trades since PDR
Modulelevel Design Examples
FPGA Verification and Validation Strategy
Current Status
The DFB FPGA generates the following types of data in order to meet science requirements (configured by the DCB):
Trigger Data:
pseudopower in 7 or 13 frequency bands in the range of 0 to 8kHz
Any two EDC, EAC, or SCM values
Time Series (Waveform) Data:
Reporting rates from 1s/s – 16Ks/s:
V1V6, V1ACV6AC, E12DCE56DC, E12ACE56AC, FM1FM3, SCM1SCM3
Spectra Data:
2048point FFT with selectable frequency binning and time averaging
Any eight signals except V1V6
Cross Spectral Data:
Selectable frequency binning and time averaging
Any four pairs from spectra data
Solitary Wave Detector Data:
Counts peaks and bins them according to magnitude
E12AC or E34AC
General Approach
Using an FPGAbased implementation allows hardware to be tailored to the mission requirements
Lower power than an equivalent processor based DSP design
Heritage from THEMIS
LTC1604 ADC and ADC support circuitry
Digital Signal Processing algorithms
Backplane interface protocols
Changes and Improvements from THEMIS
Higher sample rates (16ks/s)
Addition of onboard Cross Spectral Analysis and Solitary Wave Detection
Use of a single RTAX2000 FPGA instead of three SX72 FPGAs
Rewrite of FPGA VHDL code
Higher performance implementation of DSP algorithms
Utilization of RTAX internal RAM to improve data flow and performance
Parameterization of all modules to accommodate unforeseen design changes
Standardization of all internal interconnect buses to Wishbone architecture
Added circuitry and an additional external SRAM to provide synchronization of data transfer to the 1 pps signal…
Why? …to align data to increments of 1/128 second (the period of the buffer swapping in the DCB) coincident with the 1 pps signal so that DCB can correctly buffer and timetag the data.
How? …added double buffers in the data controller that are capable of holding 1/128 of a second of data. Data is tagged at acquisition and placed in the appropriate FIFO depending on when the sample was taken. Latency through the DSP is less than 1/128 second for all data types. Data is then burst onto the backplane at the next 1/128 second marker.
Currently implemented and is in use in the lab using internal FPGA RAM but cannot hold the full data rate of 4096 words per 1/128 second period. An additional external SRAM has been added to the next board layout to facilitate 4096 word buffers for full rate data.
Triggers and Waveforms are included only as backup slides since they were presented at PDR.
Spectra/Xspectra illustrated to give an idea of the steps in the design process.
Solitary Wave Detector to illustrate a “single page” overview.
FFT Generation:
 Select Input Signal and create 2048point buffer
 Hanning Window
 Perform 2048point FFT using Radix2 Algorithm
 End up with 1024 Real and 1024 Imag points
Spectra Power:
 Calculate power by adding squares of Real and Imag
 Perform frequency binning
 Accumulate in time
 Pseudo log compress
 Report at selected cadence
XSpectra:
 Calculate XSpectra components (see diagram)
 Perform frequency binning
 Accumulate in time
 Pseudo log compress
 Report at selected cadence
Control registers:
Implemented in the toplevel module
Uses Wishbone interface
Ingress Processor
Compares incoming data from the digital filters with the selection registers and creates a 2048 sample buffer in external RAM for each enabled quantity
Notifies the Control State Machine when input buffers are available
Control State Machine (CSM)
When an input buffer is available, the CSM transfers the data from the external RAM to an internal RAM, manages the calculation of a spectra, and then transfers the results back into external RAM.
When the spectra calculations are complete, the CSM checks to see if any XSpectra are enabled. If so, then the appropriate spectra are transferred to internal RAM, the CSM manages the calculation of the XSpectra, and then transfers the results back to external RAM.
Arithmetic Logic Unit
Implements the arithmetic logic required to do Windowing, FFT, Unpacking, Power Calculation, and XSpectral analysis…will show data flow for each of these operations shortly
Performs frequency binning
Performs time averaging
Egress Processor
Transfers spectra and xspectra results from external RAM
Performs 34 to 8 pseudolog compression
Transmits data to the telemetry stream at the appropriate cadence.
Spectra/XSpectra ALU is 32bit fixedpoint with appropriate rounding.
…other operations are included in the backup slides.
Verification is performed independent of design
Verification performed at both the module & chip levels
Functionality and algorithm correctness is verified through modulelevel simulation
Interaction with other parts of the system is verified through chiplevel simulation
Assertion Based Verification (ABV)
Using Active HDL
Code Coverage used to quantify verification
Using Active HDL
Signal into DFB Analog Electronics
From ADC to FPGA
FPGA processing
FPGA output
Compare FPGA, Simulation & IDL results
Record (raw) waveforms at 16kHz
IDL routines performing same tasks as FPGA
IDL formatting of data for ALU module simulation
FPGA Simulation
S1(10) S2(10) Speed(11) En (2) En(2) Bands(2)
E12 DC
E12 DC
1/16 S/s
On
On
7 bins
E34 DC
E34 DC
1/8 S/s
Off
Off
13 bins
E56 DC
E56 DC
1/4 S/s
…
…
…
SCM3
SCM3
64 S/s
E12 DC
7 bins
On
On
1/8 S/s
E34 DC
S1(10) S2(10) Speed(11) En (2) En(2) Bands(2)
E12 DC
1/16 S/s
E34 DC
Off
Off
13 bins
E56 DC
E56 DC
1/4 S/s
…
…
…
SCM3
SCM3
64 S/s
E12 DC
E34 DC
E56 DC
SCM3
7 bins
On
E34 DC
On
1/8 S/s
S1(10) S2(10) Speed(11) En (2) En(2) Bands(2)
E12 DC
1/16 S/s
Off
Off
13 bins
E56 DC
1/4 S/s
…
…
…
SCM3
64 S/s
ETU 1&2: Used to verify algorithms and functionality
Reprogrammable FPGAs – ProASIC3e
1xSRAM to implement either FFT or 1 PPS in external RAM
Currently in use at LASP and Berkeley
ETU 3: Used to verify flight board layout and AX/RTAX specific logic
Add EDAC RAMs where appropriate
AX2000 in a BGA package with adapter to flight pin pattern
2x SRAM to implement both FFT and 1 PPS in external RAM
FPGA to verify flight board characteristics at end of October
FM1: Implements flight functionality
Pinout change because of BGA to QFP adapter used on ETU 3
Flight functionality at end of 2009
DFB FPGA: Field Alignment Flow Diagram
E12,34,56
B (from FGM)
Adjust Gains, Offsets
Offsets
5minute lowpass
filter to find offsets
Adjust gain/offset
Rotation
matrix
Gains
Offsets (from
ground)
Rotate to E system
(gains included)
Adjust gain/offset
Interpolate
Rotate E, B simultaneously
in the xy plane until Bx = 0.
Perform rotations
using CORDIC
Rotate E, B simultaneously
in the xz plane until Bz = 0.
E_perp = Ey/1.646
E_par = Ex/1.6462
DFB FPGA: Digital Filters Diagram
Low pass section
Bandpass section
8 kS/s
+

Shift Z3
7tap FIR filter
24 kHz
Averager
4 kS/s
2:1 Decimating FIR filter (3tap)
+

Shift Z3
7tap FIR filter
12 kHz
Averager
2 kS/s
2:1 Decimating FIR filter (3tap)
(9 more banks)
(9 more banks)
+

Shift Z3
7tap FIR filter
12 Hz
Averager
2 S/s
2:1 Decimating FIR filter (3tap)
Define the number of clock cycles required for the operation…controls counters, etc…
 define the number of cycles required for each operation for
 stage 1, stage 2, and egress stage
constant CYCLES : natural_array_2d :=
((1,1,1),  WINDOW
(4,6,4),  FFT
(4,4,4),  UNPACK1
(4,6,4),  UNPACK
(2,2,1),  POWER
(4,4,2));  XSPECTRA
2. Define the control points in the flow diagrams and assign them according to the operation desired…
 define the control points for stage #1 for each iteration of each operation:
 the control points are notated in the alu data flow diagrams in the implementation specification
 the order is...(mux0, mux1, mux2, mux3, mux4, addsub5, egress_ff_select6)
constant S1_CTRL : natural_array_3d :=
(((0,0,1,0,1,0,2),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0)),  WINDOW
((0,0,0,0,0,0,0),(1,0,0,0,0,0,1),(2,0,2,0,1,0,2),(2,0,3,0,1,0,3),(3,0,2,0,1,0,4),(3,0,3,0,1,0,5)),  FFT
((0,0,0,0,0,0,0),(1,0,0,0,0,0,1),(2,0,0,0,0,0,2),(3,0,0,0,0,0,4),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0)),  UNPACK1
((0,2,0,1,0,0,0),(1,3,0,1,0,1,1),(1,3,2,1,1,0,2),(1,3,3,1,1,0,3),(0,2,2,1,1,1,4),(0,2,3,1,1,1,5)),  UNPACK
((0,0,0,0,1,0,2),(1,1,0,0,1,0,3),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0)),  POWER
((0,2,0,0,1,0,2),(1,3,0,0,1,0,3),(2,1,0,0,1,0,4),(0,3,0,0,1,0,5),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0)));  XSPECTRA
 define the control points for stage #2 for each iteration of the each operation:
 the control points are notated in the alu data flow diagrams in the implementation specification
 the order is...(mux0, mux1, mux2, mux3, addsub4, addsub5, egress_ff_select6)
constant S2_CTRL : natural_array_3d :=
(((2,0,0,0,0,0,0),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0)),  WINDOW
((0,2,5,1,1,0,0),(1,3,4,1,0,0,1),(0,2,5,1,1,1,2),(1,3,4,1,0,1,3)),  FFT
((0,0,1,2,0,0,0),(0,0,0,3,0,0,1),(2,0,0,0,0,0,2),(4,0,0,0,0,0,3)),  UNPACK1
((0,2,5,1,1,0,0),(1,3,4,1,0,0,1),(0,2,5,1,1,1,2),(1,3,4,1,0,1,3)),  UNPACK
((0,2,3,2,0,0,0),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0)),  POWER
((0,2,3,2,0,0,0),(0,4,5,2,1,0,1),(0,0,0,0,0,0,0),(0,0,0,0,0,0,0)));  XSPECTRA
3. Implement the actual arithmetic logic using the control points…
 stage 1 arithmetic logic 
s1_mux0_dout <= s1sm.iff.reg(S1_CTRL(s1sm.iff.aop,s1sm.ctr,0));
s1_mux1_dout <= s1sm.iff.reg(S1_CTRL(s1sm.iff.aop,s1sm.ctr,1));
s1_mux2_dout <= s1_mux1_dout(31 downto 16) when S1_CTRL(s1sm.iff.aop,s1sm.ctr,2) = 0 else
s1sm.iff.win when S1_CTRL(s1sm.iff.aop,s1sm.ctr,2) = 1 else
s1sm.iff.cos when S1_CTRL(s1sm.iff.aop,s1sm.ctr,2) = 2 else
s1sm.iff.sin;
s1_mux3_dout <= s1_mux0_dout when S1_CTRL(s1sm.iff.aop,s1sm.ctr,3) = 0 else s1_unpack_dout;
s1_mux4_dout <= s1_mux3_dout when S1_CTRL(s1sm.iff.aop,s1sm.ctr,4) = 0 else s1_mult_dout;
s1_unpack_dout <= (s1_mux0_dout+s1_mux1_dout)/2 when S1_CTRL(s1sm.iff.aop,s1sm.ctr,5) = 0 else
(s1_mux0_douts1_mux1_dout)/2;
s1_mult_dout_temp <= s1_mux3_dout*s1_mux2_dout;
s1_mult_dout <= s1_mult_dout_temp(DAT_WIDTH+SINCOS_WIDTH1 downto SINCOS_WIDTH);
 stage 2 arithmetic logic 
s2_mux1_dout <= s2sm.iff.reg(S2_CTRL(s2sm.iff.aop,s2sm.ctr,0));
s2_mux2_dout <= s2sm.iff.reg(S2_CTRL(s2sm.iff.aop,s2sm.ctr,1));
s2_mux3_dout <= s2sm.iff.reg(S2_CTRL(s2sm.iff.aop,s2sm.ctr,2));
s2_mux4_dout <= s2_mux1_dout when S2_CTRL(s2sm.iff.aop,s2sm.ctr,3) = 0 else
s2_add2_dout when S2_CTRL(s2sm.iff.aop,s2sm.ctr,3) = 1 else
s2_add1_dout when S2_CTRL(s2sm.iff.aop,s2sm.ctr,3) = 2 else
(others => '0');
s2_add1_dout <= s2_mux2_dout+s2_mux3_dout when S2_CTRL(s2sm.iff.aop,s2sm.ctr,4) = 0 else
s2_mux2_douts2_mux3_dout;
s2_add2_dout <= s2_mux1_dout+s2_add1_dout when S2_CTRL(s2sm.iff.aop,s2sm.ctr,5) = 0 else
s2_mux1_douts2_add1_dout;