670 likes | 892 Views
Ruth Markenson: Version changed Original slides 11, 14, 23, 24-30 removed. FPGA Design Techniques I. http://www.asicfpga.com. Objectives. Use hierarchy effectively Increase circuit reliability and performance by applying synchronous design techniques.
 
                
                E N D
Ruth Markenson: Version changed Original slides 11, 14, 23, 24-30 removed FPGA Design Techniques I http://www.asicfpga.com
Objectives • Use hierarchy effectively • Increase circuit reliability and performance by applying synchronous design techniques After completing this module, you will be able to:
Hierarchical Design • Using Hierarchy leads to greater design readability, re-use,and debug Top Level of Design Infer or instantiate I/O here State Machines Data Paths One-hot Pipelining Binary Muxing/De-muxing Enumerated Arithmetic Counters Building Blocks Adders/Subtractors Standard widths Bit Shifters Pipeline RAMs Accumulators Specific Functions Logiblox RAM Other IP/Cores Coregen FIFOs Parametizable functions FIR Filters RAM Technology Specific Functions
Benefits of Using Hierarchy • Use the best design entry method for each type of logic • Design readability • Easier to understand design functionality and data flow • Easier to debug • Easy to reuse parts of a design • Synthesis tool benefits • Covered in a later section
Ruth Markenson: 3rd bullet changed 3rd bullet, sub-bullet added Design Entry Methods • Use HDL for: • State machines • Control logic • Bused functions • Use schematics for: • Top-level design • Manually optimized logic • “Mixed mode” designs can utilize the best of both worlds • However, they are less portable than an all-HDL design
Ruth Markenson: 4th bullet changed Notes, 1st paragraph, bold text changed Design Readability Tips • Choose hierarchical blocks that have: • Minimum of routing between blocks • Logical data flow between blocks • Choose descriptive labels for blocks and signals • Keep clock domains separated • Makes the interaction between clocks very clear • Keep the length of each source file manageable • Easier to read, synthesize, and debug
Design Reuse Tips • Build a set of blocks that are available to all designers • Register banks • FIFOs • Other standard functions • Custom functions commonly used in your applications • Name blocks by function and target Xilinx family • Easy to locate the block you want • Example: REG_4X8_SP (bank of four eight-bit registers targeting Spartan) • Store in a separate directory from the Xilinx tools • Prevents accidental deletion when updating tools
Ruth Markenson: Originally slide 12 Why Synchronous Design? • Synchronous circuits are more reliable • Events are triggered by clock edges that occur at well-defined intervals • Outputs from one logic stage have a full clock cycle to propagate to the next stage • Skew between data arrival times is tolerated within the same clock period • Asynchronous circuits are less reliable • Delay may need to be a specific amount (for example, 12 ns) • Multiple delays may need to hold a specific relationship (For example, DATA arrives 5-ns before SELECT)
Ruth Markenson: Originally slide 13 2nd bullet and it’s sub-bullet deleted Asynchronous Design:Case Studies • The design I did two years ago no longer works. What did Xilinx change in their FPGAs? • SRAM process improvements and geometry shrinks increase speed • Normal variations between wafer lots • My design passes a timing simulation test but fails in circuit. Is the timing simulation accurate? YES • Timing simulation uses worst-case delays • Actual board-level conditions are usually better
D D Q_A 12.5 A B C Q_B D 3 cycles Clock Q_A Q_B Q_C 2 cycles A & C Clock B Clock 3.0 3.1 3.0 3.3 3.1 Q_A Clock skewed version Expected operation Q_B Q_C Ruth Markenson: Originally slide 15 Clock Skew • This shift register will not work because of clock skew! INPUT Q_C CLOCK
Ruth Markenson: Originally slide 16 2nd bullet, all sub-bullets changed Use Global Buffers to Reduce Clock Skew • Global buffers are connected to dedicated routing • This routing network is balanced to minimize skew • All Xilinx FPGAs have global buffers • Virtex-II devices have sixteen BUFGMUXes • Virtex and Spartan-II have four BUFGs • You can always use a BUFG symbol, and the software will choose an appropriate buffer type • All major synthesis tools can infer global buffers onto clock signals that come from off-chip
Ruth Markenson: Originally slide 17 Traditional Clock Divider • Introduces clock skew between CLK1 and CLK2 • Uses an extra BUFG to reduce skew on CLK2 D Q CLK2 D Q BUFG CLK1 BUFG
Ruth Markenson: Originally slide 18 Recommended Clock Divider • No clock skew between flip-flops D Q CLK2_CE CE D Q CLK1 BUFG
Ruth Markenson: Originally slide 19 Avoid Clock Glitches • Because flip-flops in today’s FPGAs are very fast, they can respond to very narrow clock pulses • Never source a clock signal from combinatorial logic • Also known as “gating the clock” MSB 0111 1000 transition can become 0111 1111 1000 due to faster MSB MSB Shorter routing FF LSB Glitch may occur here Binary Counter
Ruth Markenson: Originally slide 20 Avoid Clock Glitches • This circuit creates the same function, but without glitches on the clock D INPUT Q3 D Q2 Q CE Q1 Q0 FF CLOCK Counter
Ruth Markenson: Originally slide 21 Avoid Set/Reset Glitches • Glitches on asynchronous Set/Reset inputs can lead to incorrect circuit behavior INPUT Q D Asynchronous Reset FF Binary Counter CLR RESET Q[x] Q[0] CLOCK
Ruth Markenson: Originally slide 22 Avoid Set/Reset Glitches • Convert to synchronous set/reset when possible INPUT Q D Synchronous Reset FF Counter R RESET Q[x] Q[0] CLOCK
Ruth Markenson: Originally slide 33 Review Questions • List three benefits of hierarchical design • Why should you use global buffers for your clock signals?
Ruth Markenson: Originally slide 24 Answers • List three benefits of hierarchical design • Allows you to use different design entry methods • Design readability • Design reuse • Why should you use global buffers for your clock signals? • To reduce clock skew
Ruth Markenson: Originally slide 25 3rd bullet, last 2 sub-bullets deleted (change to last 3 if FSM removed) Summary • Proper use of hierarchy aids design readability and debug • Synchronous designs are more reliable than asynchronous designs • FPGA design tips • Global clock buffers and DLLs eliminate skew • Avoid glitches on clocks and asynchronous set/resets
Ruth Markenson: Originally slide 36 Where Can I Learn More? • Application notes on http://support.xilinx.com • Under the Design tab, click App Notes • Software documentation • Development System Reference Guide, Chapter 2 “Design Flow,” FPGA Design Techniques section • Libraries Guide • Documentation for your synthesis tool
FPGA Design Techniques II Version 4
Objectives • Increase design performance by duplicating flip-flops • Increase design performance by adding pipeline stages • Increase board performance by using I/O flip-flops • Build reliable synchronization circuits After completing this module, you will be able to:
Duplicating Flip-Flops • High-fanout nets can be slow and hard to route • Duplicating flip-flops can fix both problems • Reduced fanout shortens net delays • Each flip-flop can fanout to a different physical region of the chip to help with routing • Design tradeoffs • Gain routability and performance • Design area increases fn1 D Q fn1 D Q fn1 D Q
Which Flip-Flops Should I Duplicate? • Duplicate flip-flops that source: • Address and control lines to large memory arrays • Clock enables or output enables • Synchronous reset signals • Do not duplicate flip-flops that are sourced by asynchronous signals • Synchronize the signal first • Feed synchronized signal to multiple flip-flops
Tips on Duplicating Flip-Flops • Name duplicated flip-flops _a, _b NOT _1, _2 • Numbered flip-flops are mapped by default into the same CLB • You want duplicated flip-flops to be separated • Especially if the loads are spread across the chip • Synthesis tips • Most synthesis tools have fanout-control features • However, they do not always pick the best implementation • Xilinx recommends explicitly creating duplicate flip-flops in HDL code • Many synthesis tools will optimize out duplicated flip-flops • Tell your synthesis tool to keep redundant logic
Duplicating Flip-Flops Example • The source flip-flop drives two register banks that are constrained to different regions of the chip • The source flip-flop and pad are not constrained • PERIOD = 5 ns timing constraint • Implemented with default options • Longest path = 6.806 ns • Fails to meet timing constraint
Duplicating Flip-Flops Example • The source flip-flop has been duplicated • Each flip-flop drives a region of the chip • Each flip-flop can be placed closer to the register that it is driving • Shorter routing delays • Longest path = 4.666 ns • Meets timing constraint
two logic levels one level one level Pipelining Concept fMAX = nMHz D Q D Q fMAX = 2nMHz D Q D Q D Q
Pipelining • Inserting flip-flops into a datapath is called pipelining • Pipelining increases circuit performance by reducing the number of logic levels between flip-flops • Fewer logic levels = shorter delays between flip-flops = faster clock speeds • Xilinx FPGA architectures support pipelining • Many flip-flops are available • Slice structure is a Look-Up Table (Logic Level) followed by a flip-flop
Pipelining Considerations • Are there enough flip-flops available? • Refer to the Map Report • You generally will not run out of FFS • Are there multiple logic levels between flip-flops? • If there is only one logic level between flip-flops, pipelining will not improve performance • Exception: Long carry-logic chains can benefit from pipelining • Refer to the Post-Map Static Timing Report or Post-Place & Route Static Timing Report • Can the system tolerate latency?
Latency in Pipelines • Each pipeline stage adds one clock cycle of delay before the first output will be available • Also called “filling the pipeline” • After the pipeline is filled, a new output is available every clock cycle
Pipelining Example • Original circuit • Two logic levels between SOURCE_FFS and DEST_FF • fMAX = ~137 MHz LUT D Q Q D LUT LUT SOURCE_FFS DEST_FF LUT
Pipelining Example • Pipelined circuit • One logic level between each set of flip-flops • fMAX = ~228 MHz D Q LUT D Q D Q LUT D Q LUT SOURCE_FFS DEST_FF D Q LUT PIPE_FFS
Pipelining Questions • Given the original circuit, what is wrong with the pipelined circuit? • How can the problem be corrected? Pipelined Circuit Original Circuit
Pipelining Answers • What is wrong with the pipelined circuit? • Latency mismatch • Older data is mixed with newer data • Circuit output is incorrect • How can the problem be corrected? • Add a flip-flop on SELECT • All data inputs now experience the same amount of latency
I/O Flip-Flop Overview • Each Virtex-II I/O Block contains six flip-flops • For single data rate: • IFD on the input • OFD on the output • ENBFF on the three-state enable (not shown) OFD Data from FPGA D Q Output clock enable CE Clock I/O pad IFD Data to FPGA Q D Input clock enable CE Clock
Benefits of Using I/O Flip-Flops • Guaranteed setup, hold, and clock-to-out times • When the clock signal comes from a BUFG • Programmable delay on input flip-flop • Programmable slew rate on output flip-flop OFD Data from FPGA D Q Output clock enable CE Clock I/O pad IFD Data to FPGA Q D Input clock enable CE Clock
NONE IFD IFD D Q Delay CE Programmable Input Delay • Delay is added to the D input of the IFD or ILD • Guarantees 0-ns hold time when using a global buffer • Trades an increase in setup time for a decrease in hold time • Delay is controlled by attributes • NONE is the default, if the clk input of the IFD/ILD is driven by a DCM clk output • IFD is the default, if the clk input of the IFD/ILD is not driven by a DCM clk output BUFG
NONE IFD IFD D Q Delay CE BUFG Programmable Input Delay Example • XC2V250-5 (taken from timing analysis report with v1.90 Virtex-II speed files) • Clk = DCM output (default is NONE) • IFD Delay tSU = 4.0ns tH = 0.0ns • NONE tSU = 1.3ns tH = 0.0ns • Clk = NOT a DCM output (default is IFD Delay) • IFD Delay tSU = 1.2ns tH = 0.0ns • NONE tSU = -1.5ns tH = 1.7ns
Programmable Output Slew Rate • Slew rate can be SLOW or FAST • By default the slew rate is SLOW • Virtex/E/II/Pro: Only the LVTTL I/O standard supports programmable slew rate • Slew rate is controlled by attributes • Included in the input design, or specified in the Constraints Editor
How to Access I/O Flip-Flops • During synthesis • Based on the input/output timing requirements, the synthesis tool will move a flip-flop into the IOB • The designer may specify which registers should be placed in the IOB • The synthesis tool will specify an IOB=True attribute on a given register in the netlist for both cases above
How to Access I/O Flip-Flops • During implementation • The mapper can move eligible flip-flops into the I/O blocks • In the Implementation Options dialog, select the Optimize and Map tab • Pack I/O Registers/Latches into IOBs for (-pr): • [Off | Inputs Only | Outputs Only | Inputs and Outputs] • -Timing option will map registers in IOB for timing critical paths • Using the Xilinx Constraint Editor’s Misc. tab • Specify individual registers that should be placed in the IOB
IOB Flip-Flop Requirements • There are some restrictions to accessing the IO flip-flops: • Input and output registers cannot have combinatorial logic between pad and register • For three-state IO, all registers must share the same reset • Can have separate clocks and clock enable • For three-state flip-flop, output must be active-low • The input to the three-state flip-flop can be inverted, but there is no inverter between the flip-flop and the three-state buffer
Synchronization Circuits • What is a synchronization circuit? • Captures an asynchronous input signal and outputs it onto a clock edge • Why do I need synchronization circuits? • Prevent setup and hold time violations • End result is a more reliable design • When do I need synchronization circuits? • Signals cross between unrelated clock domains • Chip inputs that are fully asynchronous (not controlled by any clock)
Setup and Hold Time Violations • Violations occur when the flip-flop input changes too close to a clock edge • Three possible results: • Flip-flop clocks in old data value • Flip-flop clocks in new data value • Flip-flop output becomes metastable
Metastability • What is it? • Flip-flop output enters a transitory state • Neither a valid zero nor a valid one logic state • The value may be interpreted as a zero by some loads and as a one by others • The output remains in this state for an unpredictable length of time before settling to a valid zero or one • Only statistical analysis of metastability is possible • When does it occur? • Result of a transition on the input during the tiny timing window where the flip-flop accepts new data • Even smaller than the setup-hold-time window
Guarding Against Metastability • Due to its statistical nature, the occurrence of metastable events can only be reduced, not eliminated • Mean Time Between Failures (MTBF) is exponentially related to the length of time the flip-flop is given to recover • A few extra ns of recovery time can dramatically reduce the chances of a metastable event • The circuits shown in this section allow a full clock cycle for metastable recovery
Asynchronous Inputs • This refers to two types of signals • Signals not controlled by any clock • Signals controlled by a clock that is not seen by the FPGA • Two cases to consider • Input pulses are always at least one clock period wide • Input pulses may be less than one clock period wide
Synchronization Circuit #1 • Use this circuit when input pulses will always be at least one clock period wide • The “extra” flip-flop guards against metastability Guards against metastability Synchronized signal Asynchronous Input D Q D Q FF1 FF2 CLK