1 / 45

System Functionality Verification using FPGA

System Functionality Verification using FPGA. Contents. Section I Introduction to reconfigurable computing FPGA Logic/Routing architecture Section II Core-embedded FPGA ALTERA/XILINX/TRISCEND/SiDSA Section III Multiple-FPGA architecture Emulation/Simulation acceleration using FPGA ’ s.

chapa
Download Presentation

System Functionality Verification using FPGA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. System Functionality Verification using FPGA

  2. Contents • Section I • Introduction to reconfigurable computing • FPGA Logic/Routing architecture • Section II • Core-embedded FPGA • ALTERA/XILINX/TRISCEND/SiDSA • Section III • Multiple-FPGA architecture • Emulation/Simulation acceleration using FPGA’s

  3. Introduction • Design execution methodology • Hardware • Very fast & efficient • No alteration after fabrication • Expensive process to redesign and refabrication • Software-programmed processors • Set of instructions determines a specific operation. • Functionality can be easily changed. • Performance is far below that of an ASIC.

  4. Reconfigurable Computing • Fill the gap between hardware and software • FPGA is an array of computational elements and the routing wires among them. • The configuration is determined by programmable configuration bits. • Development • 1963 : Concept of “restructurable computing” appeared. • 1980’s : FPGA technology developed as a hybrid device between PALs and MPGA(Mask Programmable Gate Arrays) by Xilinx, Altera, Lucent, QuickLogic.. • SRAM-programmable FPGA : high density • 1999-Now : Core-embedded FPGA incorporates both of programmable processor and FPGA.

  5. Logic Block • LUT-based logic block • Efficient logic block architecture adopted in many commercial FPGA’s • Composed of LUT, DFF(Latch), and mux I1 I2 I3 I4 Cout Cin carry logic 4-LUT DFF Out

  6. Logic Block • 4-LUT • Any function with 4 input variables can be implemented. • FF • Used for pipelining, registers, • It can be configured for latch by configuration • Clock signals come from global signals routed on special resources (Global net) • Carry logic • Speed up the carry-based arithmetic functions • Bypass the routing resources but connected directly to the neighboring CLB

  7. Interconnection Architecture • Island-style FPGA routing architecture • Routing architecture of most FPGA architectures • Sea of routing resources for connection between rows and columns of logic blocks • Connection blocks : Programmable multiplexers that selects the signals in the given routing channel to be connected to the logic block’s terminal. • Switch Box: Connections between horizontal and vertical routing resources

  8. Interconnection Architecture • island-style routing architecture

  9. Interconnection Architecture • Routing resources with various lengths • Local interconnections : Routing between logical blocks (ex. dedicated carry chain) • Medium length lines : Routing wire that runs width of several logical blocks • Long lines : Routing wire that runs the whole chip height or width • Global lines : Routing wire that runs the entire area of the chip • High-speed, low-skew, connections to all logic blocks • Usually used for clocks, resets.

  10. Two Routing Architectures • Segmented routing architecture • Local communication traffic by short wires • Long wires are frequently used to travel long distances without passing through many switches • Researches • How many wires should be contained in each channel? • How many types of long wires would be efficient? • Proper portion of each wire type in the whole routing resources • Companies : Xilinx, Lucent, Vantis

  11. Two Routing Architectures • Hierarchical routing architecture • Cluster-based routing architecture • Routing within a cluster is at the local level, only connecting within that cluster. • Longer wires connect different clusters together. • Each routing level contains several clusters • Background • Most connections between logic blocks are local with only a limited amount of communication traversing long distance • Good placement algorithm is required. • Company : ALTERA

  12. cluster Two Routing Architectures Segmented Routing Hierarchical Routing Logic blocks Connection switches

  13. Heterogeneous architecture • Multiplier embedding • Multiplier implementation in FPGA is usually inefficient. • Custom/Configurable hardware for multiplication with various operand widths and choice of signed/unsigned can be embedded using a reconfigurable array of FAB’s (special full adder blocks). • (Haynes, Field-Programmable Custom Computing Machines, 1998)

  14. Heterogeneous architecture • Embedded memory blocks • Use of available LUTs as RAM structure (Xilinx XC4000, Virtex FPGAs) • Dedicated memory blocks within array (Xilinx Virtex and Altera FPGAs)

  15. Xilinx Virtex architectureBlock SelectRAM is embedded inside logic blocks as a column.

  16. Heterogeneous Architecture • Processor embedding • At late 2000, several commercial FPGA companies have announced plans to include entire microprocessors. • Altera • ARM9-based Excalibur device • Xilinx • PowerPC based Virtex-II device • Triscend • 8051/ARM based SoC integration platform

  17. SoC Verification through FPGA’s Core-Embedded FPGA

  18. Core-Embedded FPGA’s • ALTERA • ExcaliburTM • ARM-embedded FPGA • StratixTM • Currently without ARM core. Excalibur’s next version is under development. • XILINX • Virtex-II ProTM • IBM’s PowerPC-embedded FPGA. • Triscend • A7 • ARM-embedded FPGA • E5 • 8051-embedded FPGA

  19. ALTERA’s Excalibur • ARM9 core integrated with FPGA • “SOPC (System On Programmable Chip)” • C/C++ compiler/debugger integrated in the FPGA compiler. • Interface between processor and FPGA • AMBA (Advanced Microcontroller Bus Architecture) • The widely used internal bus architecture for SoC. • The connection between ARM processor and FPGA block is done by AMBA bus.

  20. Clock Domain 1 (AHB1) (up to 180MHz) Clock Domain 2(AHB2) (up to 90MHz) Clock Domain 3 (PLD) (up to 100MHz) ALTERA’s Excalibur

  21. Clock Domain 1 (AHB1) (up to 180MHz) Clock Domain 2(AHB2) (up to 90MHz) Clock Domain 3 (PLD) (up to 100MHz)

  22. ALTERA’s Excalibur • AHB1 • Bridge for AHB2 • Interrupt controller, watchdog timer • Single Port & Dual Port SRAM • The Embedded processor is the only bus master on AHB1

  23. ALTERA’s Excalibur • AHB2 • PLD transfers data with memories, UART or PLD slave • Dedicated interfaces between stripe (Processor and Peripherals) and PLD

  24. AHB2 • PLD transfers data with memories, UART or PLD slave • Dedicated interfaces between stripe (Processor and Peripherals) and PLD

  25. XILINX’s Virtex-II Pro • PowerPC core integrated with FPGA • “Platform FPGA architecture” • Up to four PPC cores can be integrated. • Interface between processor and FPGA • CoreConnect Bus • PLB (Processor Local Bus) • DCR (Device Control Register) bus • OCM(On-Chip Memory) interface • Dedicated interface between the block RAM and OCM signals of PPC core.

  26. Virtex-II Pro Block Diagram PowerPC core. This block diagram contains two PPC cores. Block RAM and multiplier blocks Configurable logic block array

  27. PPC Core Block Block RAM Block RAM OCM controller is dedicated interface between PPC and Block RAM. DCR bus Block RAM can be configured as Instruction-Side Block RAM(ISBRAM) or Data-Side Block RAM(DSBRAM). Control OCM controller PPC 405 Core Fixed latency of memory access guarantees higher speed execution. OCM controller Block RAM can be configured as dual-port RAM (Data communication between PPC and FPGA). PLB master interface ports are at the boundary of PPC core. Block RAM Block RAM

  28. Triscend’s E5/A7 • E5/A7 • “CSoC(Configurable System-on-Chip)” • E5 contains 8051 core, CSL(Configurable System Logic) matrix, and peripheral interfaces(JTAG, DMA, Timer, FIFO) • A7 contains ARM core instead of 8051. • CSI (Configurable System Interconnect) • Bus developed by Triscend. • Pipelined bus architecture for the performance optimization

  29. Triscend E5/A7 • Bus architecture allows the bus to be expanded throughout the whole chip while preserving high-performance. • The internal system bus is extended throughout the user-configurable system logic. • Objectives • Inclusion of any processor is possible. • High-performance assured regardless of the CSL size

  30. Triscend’s A7 Architecture • CSI Bus • Configurable System Interconnect • Masters of CSI • ARM • JTAG(Configuration) • DMA0, DMA1, DMA2, DMA3 • Sideband Signals • Dedicated small # of signals for UART, Timer

  31. Triscend’s CSL matrix • Vertical/Horizontal Breakers • Vertical : Address Decoder part of CSI • Horizontal : Data read/write port of CSI • Selector • Decodes address • Registers are arranged in vertical column of CSL cells • Pre-programmed at the initialization

  32. CPU Bus master requires grant signals from arbiter DMA JTAG Triscend’s System Architecture Bus FIFO/ Arbiter for multiple Masters CSL RAM ROM CPU runs boot code initially. Boot code is for configuring CSL as well as storing program/data. Memory Interface

  33. CSI Bus Architecture Bus FIFO Master Write – Address/Data/Control Slave Write – Address/Data/Control Master Read – Data/Control Slave Read – Data/Control Master Master Selectors and pipe registers Selectors and pipe registers Dedicated Slave CSL Arbiter Arbiter Dedicated Slave CSL

  34. Pipelined Write Transaction Time Slot T+1 Bus FIFO Master Write – Address/Data/Control Slave Write – Address/Data/Control Master Read – Data/Control Slave Read – Data/Control Time Slot T Master Master Time Slot T+2 Selectors and pipe registers Selectors and pipe registers Dedicated Slave CSL Arbiter Arbiter Dedicated Slave CSL

  35. Pipelined Read Transaction Time Slot T+1 Bus FIFO Time Slot T+3 Master Write – Address/Data/Control Slave Write – Address/Data/Control Master Read – Data/Control Slave Read – Data/Control Time Slot T Master Master Time Slot T+2 Selectors and pipe registers Selectors and pipe registers Dedicated Slave CSL Arbiter Arbiter Dedicated Slave CSL

  36. Pipeline in view of Bus Logic T+1 T+2 T+3 T Bus FIFO arbiter Data from CSL to Master master Address/ Data Configure Selector Decode Read from CSL

  37. Wait State • Why is it generated? • 1. The handshake operation inside the logic implemented in CSL. • 2. CSL logic is too slow to respond in one cycle. • Sequence of generation • 1. “Address Selector” in CSL generates wait state if the system tries to access the Selector’s address. • 2. If more than one wait state is required, the CSL function inserts additional wait states.

  38. Waitnow OR Wait State Insertion T+1 T+2 T+3 T Bus fifo arbiter Data from CSL to Master master Address/ Data Configure Selector Decode Read from CSL

  39. Bus pipeline registers at each bank boundary  Time slots for user logic is independent of the signal transport time between banks. The write/read bus is distributed throughout CSL and buffered and piped into the bank as shown by the red arrows. Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell • The wait signals generated from each bank is propagated to the pipeline registers in all other banks. Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell Wait Dist. Logic Cell CSL Physical Structure Logic tile 16x8 RAM System Logic 8K RAM 16x8 RAM Bank Bank Bank Bank Bank Bank Bank Bank Bank Bank Bank Bank Bank Bank Bank Bank

  40. Structure Bank/Bus/Selector Bank Selector Selector Selector Selector Selector Selector Selector Selector Configured initially for the selection of the column/wait generation. Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile 4 wires each tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Horizontal data line writes data to CSL cell. The read data is OR’ed to the horizontal read data line. Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile

  41. E5 Physical Implementation • 8051 CPU core • 0.35um, 40MHz CSL operation 8051 CPU core and RAM/ROM CSL matrix

  42. SiDSA’s FIPSOC • Integration of CAB (Configurable Analog Block) • 8051 microcontroller • FPGA • Configurable analog cells optimized for data acquisition applications • Dynamic reconfiguration • Two configuration bits for each CLB • User can download extra configuration data while the cells are in operation.

  43. Analog Subsystem • Configurable Analog Blocks (CAB) • Differential amplification • Comparison • Data conversion (ADC, DAC) • Digital part • Digital part to configure CAB is controlled by the mP or the programmable logic.

  44. Comparison • Xilinx • Using CoreConnect bus to connect processor and FPGA. • Multiple processor cores can be used simultaneously. • ALTERA • AMBA bus to connect processor and FPGA. • Triscend • Processor can read/write any register inside of CSL matrix. (CSL matrix can be considered as a functional block of the processor) • Intensive pipeline schemes adopted to maintain/increase the throughput, as the latency otherwise caused by the distributed bus throughout the CSL matrix can be excessive.

More Related