Methodologies for Reliable Design Implementation Melanie Berg NASA Office of Logic Design 2004 MAPLD International Conference September 8-10, 2004 Washington, D.C.
Overview • This session will present methodologies for reliable design implementation • Designs that are covered will first be explained. Following the design description, the corresponding VHDL will be presented • Topics that will be covered: • Counters: Ripple vs. Synchronous • Triple Mode Redundancy (TMR) • Asynchronous Clock domain crossing • FIFO Memories • State Machines and Mitigation
Common VHDL/Synthesis Misperceptions • VHDL and its synthesis tools produce unexpected results • VHDL does not produce efficient circuitry • The synthesis tool will not produce what is desired • VHDL is for software folks
VHDL and Design • One should not start writing VHDL code until the design is well understood and analyzed • Anyone can pick up a VHDL book and learn syntax • Anyone can pick up a synthesis manual and learn directives • But how do we create reliable circuits?
Key Ingredients for Successful and Reliable Designs: VHDL through Synthesis • Remember … the goal is to design reliable hardware. • VHDL – looks like software, however, the designer must understand proper hardware design techniques including the electrical characteristics of the employed technology • VHDL RTL must functionally match gate level (post synthesis) for simulation purposes. This requires enforcing strict coding rules • Designer must be familiar with the synthesis tools and their interpretation of VHDL code • Combinatorial circuits vs. Sequential • Clock structures and potential skew • Proper State machine implementation • Arithmetic circuitry • Clock domain crossings • Reset logic • Mitigation • When to use specific Synthesis directives • Etc… • VHDL Coding Style is very important
What is the Importance of VHDL Coding Styles. • No Synthesis tool can be as efficient as proper Coding Style • ASICS and FPGAs will be smaller and faster. • Proper VHDL Coding Style is easier to verify • We would like to shorten the Design Cycle. Coding Style will affect • Quality of Synthesis: drive to tool to better results, • FPGA mapping or design: can take advantage of the technology • Place and Route: designs that are well thought out will have a clean route
Coding Style Specifics - Think “Hardware” • Architect with comprehension of your target’s features (ASIC and FPGA) • Separate Combinational and Registered blocks • Watch out for inferred latches • Pay attention to large fan-out nets • Consider how you code state machines • Be careful with designing long paths of logic • Be aware of when you are able to use Resource sharing
Code Restructuring If (Aflag=‘1’) outdata <= Adata; elsif (Bflag=‘1’) outdata <= Bdata; elsif (Cflag=‘1’) outdata <= Cdata; else outdata <= ‘0’; • What circuit structure does this code produce? • Which line of code is likely to be in the critical path?
Answer: If (Aflag = ‘1’) outdata <= Adata; elsif (Bflag = ‘1’) outdata <= Bdata; elsif (Cflag = ‘1’) outdata <= Cdata; else outdata <= ‘0’; Adata Bdata outdata Cdata 0 Aflag Cflag Bflag
Code for Late-Arriving Signal if (cflag=‘1’ and aflag=‘0’ and bflag=‘0’) outdata <= Cflag; elsif (aflag=‘1’) outdata <= Aflag; elsif (bflag=‘1’) outdata <= Bflag; else outdata <= ‘0’; Cdata mux Bdata mux Adata outdata mux 0 Aflag and Bflag Cflag
Duplicating Logic to Improve Speed • Most synthesis tools have a fanout control • Be careful, the tools don’t always pick the best implementation creating a more difficult design to place and route • Better to explicitly duplicate logic in code • you may need to use the syn_keep option for combinational logic • Good Examples to duplicate: • Address and control lines to large RAM • Clock enables • Synchronous reset signal • Other high fanout nets
Pipelining Logic to Improve Speed • Pipelining is most efficient for critical paths which can not be fixed by special coding schemes. • Some may think that it will increase area. However, with very long paths, synthesis may be duplicating logic in order to meet timing. Pipelining can reduce the replication. • It also will help while in Place and route
Synthesis Tools • Synthesis Optimization Algorithms are geared towards synchronous designs. • Unexpected synthesis gate level output can occur if the design is extremely asynchronous – tool gets confused • The major parameters within the optimization algorithm are Timing and area • Timing is measured from a starting DFF (or input) through a combinatorial path to the next DFF (or output). Timing can only accurately be measured (during optimization) when the DFFs are connected to the same clock • Redundancy within VHDL code is usually synthesized away (area optimization). The designer must place synthesis directives (attributes) on the nets that are part of the redundant path
Synthesis Tools • Designers should beware of push-button mitigation within synthesis. • Mitigation must be glitch-free • Safe directive for state machines is not effective … false sense of safety • When logic is added after VHDL-RTL phase, it is generally difficult/tedious to verify. Remember, reliable circuits must be verified • Companies are aware and are working on some of these issues • Synthesis output can be trusted if proper synchronous design techniques (including cleanly written VHDL) are followed. However, Formal Verifiers are needed! This will increase our test coverage. • Companies are also aware of this… but … where is it?
Reliability • Design can be verified to work under worse case conditions • Aerospace specifics includes verifiable mitigation techniques under worse case conditions • Circuit has predictable behavior • Circuit has a definable reset state • Bottom line is to Use the following Techniques: • Design for Verification (DFV) • Design for Test (DFV) • Design for Reliability (DFR)
VHDL Example of a Data output feeding a Clock input: Avoid for Reliability Process (sysclk,reset) Begin if reset = ‘0’ then dff0 <= ‘0’; elsif rising_edge(sysclk) then dff0 <= not dff0; end if; End process; Process (dff0,reset) Begin if reset = ‘0’ then dff1 <= ‘0’; elsif rising_edge(dff0) then dff1 <= not dff1; end if; End process;
Adding TMR to Circuitry • It is up to the designer to pick the strategic places that TMR will be inserted. • TMR must be glitch free • TMR must be verifiable • Synthesis directives (attributes) generally need to be used in order to not optimize away the Mitigation
Shared TMR Logic • In this scheme – The designer must triple the number of DFF’s. • The DFFs feed 1 TMR block • If the technology is susceptible to transients, this method will not be efficient
Distributed TMR • In this scheme – The designer must triple the number of DFF’s and triple the number of inserted TMR blocks. • Although more area extensive, this method adds a level of reliability, If the technology is susceptible to transients
More Reliable Ripple Counter – Distributed TMR: Picture Depicts bit 0 and bit 1
More Reliable Ripple Counter – Still a Potential Problem • Paths delays due to routing differences are never exact • What happens if (referencing paths A, B, and C) A, B, and C are all logic 0 and are expected to go to a logic 1: • Path A comes in first • Path B comes in second (TMR logic will go to logic 1) • Path B gets hit by a SEU – path C has not come in yet – TMR logic will go to logic 0 … start of GLITCH! • Path C comes in – TMR now turns on once again • Probability can be very small to negligible – depends on clock speed, and routing delay differences • The glitch will have a relatively small period – will probably violate device specifications and can damage the device • Also adds a major level of complexity within verification
Synchronous Reliable Solution • Instead of a ripple counter, use a synchronous counter • Without Mitigation Case (no SEUs): • Assumes a reset will clear any SEU hits that cause incorrect counting • With mitigation (glitch free TMR for example). • Assumes counter must always be correct for Single Event Upsets • Clock tree feeds the clock pin – Assuming a hardened clock tree (SET – free), SEU hits should not create DFF clock oscillations • Data feeds data pins • Although the SEU is Asynchronous, the glitch free TMR circuit will override the possible metastable oscillation on the data pin. • The following example shares TMR circuitry per stage (unlike the other example where each path has separate TMR logic per stage). This design choice assumes a technology that will not have (or has an extremely low probability of) internal transient glitches or SETs.
VHDL Synchronous Counter without Mitigation Counter <= counter_in; Counter_plus_1 <= counter_in + 1 ; -- need special library for this statement Process (sysclk,reset) Begin if reset = ‘0’ then -- counter is resetable to a constant value counter_in <= (others=> ‘0’); elsif rising_edge(sysclk) then -- clock pin is connected to system clock counter_in <= counter_plus_1; end if; End process;
VHDL Synchronous Counter with Mitigation --------------------------------------------------------------------------------------------------------------------- -- signal declarations: For this circuit we will use 3 counters that feed into TMR voting logic. -- the output of the TMR will be fed into a combinatorial logic for the counter. -- the output of the combinatorial logic of the counter will be fed back into counter register --------------------------------------------------------------------------------------------------------------------- signal counter_plus_1 : std_logic_vector(3 downto 0); signal counter0_in : std_logic_vector(3 downto 0); signal counter1_in : std_logic_vector(3 downto 0); signal counter2_in : std_logic_vector(3 downto 0); signal counter_tmr : std_logic_vector(3 downto 0); --------------------------------------------------------------------------------------------------------------------- -- attributes used as synthesis directives: Important so that mitigation logic will not get optimized away --------------------------------------------------------------------------------------------------------------------- attribute syn_preserve : boolean; attribute syn_preserve of counter0_in : signal is true; attribute syn_preserve of counter1_in : signal is true; attribute syn_preserve of counter2_in : signal is true;
VHDL Synchronous Counter with Mitigation -- generate TMR logic for each bit of counter … works across the 3 counters -- TMR is a compiled entity in the work directory -- TMR_OUT <= (a and b) or (a and c) or (b and c) -- This TMR entity is written bit-wise, however, the designer can make a more robust entity that will accept vector inputs counter_string_tmr: for i in 3 downto 0 generate begin ucount_bit : entity work.TMR port map( A => counter0_in(i), B => counter1_in(i), C => counter2_in(i), TMR_OUT => counter_tmr(i) ); end generate;
VHDL Synchronous Counter with Mitigation counter_plus_1 <= counter_tmr + 1; -- Output of TMR circuitry counter <= counter_TMR; Process (sysclk,reset) Begin if reset = '0' then -- counter is resetable to a constant value counter0_in <= (others=> '0'); elsif rising_edge(sysclk) then -- clock pin is connected to system clock counter0_in <= counter_plus_1; end if; End process; Process (sysclk,reset) Begin if reset = '0' then -- counter is resetable to a constant value counter1_in <= (others=> '0'); elsif rising_edge(sysclk) then -- clock pin is connected to system clock counter1_in <= counter_plus_1; end if; End process; Process (sysclk,reset) Begin if reset = '0' then -- counter is resetable to a constant value counter2_in <= (others=> '0'); elsif rising_edge(sysclk) then -- clock pin is connected to system clock counter2_in <= counter_plus_1; end if; End process;
VHDL Synchronous Counter with Mitigation Analysis • No SEUs: • After static timing analysis has been verified and all paths meet timing with slack, all data is considered to be stable at each clock edge – no concerns for metastability or incorrect data capture • SEU Hit • If an SEU hits a DFF, then the glitch free TMR will override any one DFF input change (output of TMR is stable), thus, data is still considered to be stable near a clock edge - no concerns for metastability or incorrect data capture • Synchronous Advantage Meets Reliability Requirements • Easily verifiable in the RTL domain if static timing is met and gate level is functionally equivalent to RTL • Mitigation and general functional behavior is predictable under worse case conditions • TMR circuitry is verifiable (due to it being within the RTL vs being inserted in synthesis) • A Definable reset state exists
Asynchronous Clock Domain Crossing • Most common problem within designs. • Why? • Designers usually don’t design for all corner cases – i.e. data must be synchronized to capturing clock domain before usage • Multiple clock domains add an extra level of complexity to the synthesis optimization algorithm. • The major optimization parameters for the synthesis algorithm are timing and area • Timing is measured for each synchronous path (DFFs connected to the same clock tree) • Generally, extra timing constraints are necessary when crossing clock domains
Design Example • Implement a serial asynchronous input port that receives: DCLK, DATA, and DENV. Skew relative to DCLK will be less than 10 ns • Input speed can range from 5Khz to 10Mhz. • DATA and DENV will change at the rising edge of the DCLK signal (thus will be stable at the falling edge). • Input Data is in the form of 16 bit words MSB first. • DENV is active high • Technology has glitch-free hardened by design mitigation within the silicon (attached to DFFs)
Determine FPGA System Clock speed • Limiting factor is the 10MHZ input – new data can come every 100 ns • Data is asynchronous – need a metastability filter …
Need an Edge Detection for the Input signal • Remember – there is a difference between the input clock and the FPGA system clock (asynchronous) • Goal: Capture data while data is stable (avoid metastability) • We want to detect an edge of the input control (DCLK), then look at the envelope (DENV) - see if it is active, then grab the data. • Why the edge and not the level …We only want to sample input data once per input clock period …at a stable place in the clock period … • Which edge should we use (falling or rising)?…
Analysis • Most people will want to capture at the falling edge because data is known to be stable there. • However, once you detect the stable edge you only have half an input clock period minus overhead … minus data to clock skew … • Therefore – you will be required to implement a much faster system clock
Best Choice • Look for the rising edge of the input clock. Although data is changing here… by the time you actually detect the clock – data will be valid! • Remember – because the DCLK is asynchronous to your FPGA, it must go through a metastability filter before you can sample – THIS TAKES TIME … how much?…
Design Decision… • Envelope will be captured by one DFF (no metastability filter) and will be “anded” with the clock edge detect in order to enable data capture …’ • I.e. if the clock edge is detected and the envelope is valid then capture data. Thus the data does not go through a metastability filter either!!! Saves gates and is safe – we only look at data (and envelope when they are stable – the input clock edge tells all) • We can use only one DFF because the enable is guaranteed to be stable by the time the clock edge is detected …clock edge detection has to take at least 2 system clock cycles – by then envelope is clean. • Why not send the DENV signal through a metastability filter too??
Metastability Filter Recap: • Choose only one signal (control signal) to send through the metastability filter – avoid skew of asynchronous input signal near clock edge problems • Determine where the signals are stable relative to the incoming control signal • Sample and grab data • Do a complete timing analysis: watch out for skew, setup & hold, input clock frequency vs. system clock frequency. • Watchout!!!! For large Fan-out of output of metastability filter – Synthesis tool may duplicate the filter. This will cause data to possibly be out of sync within different portions of the circuit – use a don’t_replicate attribute on the filter output
Metastability Filter Recap: • Remember – Our example’s timing analysis applies to capturing the data relative to the same edge of the input clock that the input data changes – we can do this because we have a delay due to the metastability filter and this delay is larger than the 10 ns skew (between the clock, data, and envelope) listed in the spec. • We chose to capture data relative to the same clock edge that the input data changes so that we can choose a slower clock – slower clock means less power and easier design implementation requirements.
Metastability Filter Recap: • The designer could make the decision to capture data relative to the edge that the input clock is changing – however, that would require a much faster FPGA system clock for proper implementation -- see timing analysis. • The designer could also choose to send the DENV and DDATA lines though metastability filters (not a good choice – unnecessary extra logic). However, this can only be done if you detect the input clock where data and envelope are stable …and capture there– otherwise (due to skew) incorrect results can occur
Based on the two previous Cases…Best and Worse • Upon the worse case – we see that we will need at least 3 FPGA system clock cycles to capture the data… however we also need some overhead (20%) due to variations in environment (voltage, temperature, clock skew, process…) • Clock needs to be… 3x < 100 ns - .20x –10 ns X becomes about 28.125 ns … or 36MHZ For ease …Lets choose a 40 MHZ FPGA system clock (25 ns clock period … definitely fast enough)