a simplified approach to fault tolerant state machine design for single event upsets n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets PowerPoint Presentation
Download Presentation
A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets

Loading in 2 Seconds...

play fullscreen
1 / 26

A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets

3 Views Download Presentation
Download Presentation

A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

  2. Overview • Presentation describes “Hardened by Design” techniques at a high level of abstraction… FGPA/ASIC logic Design • Background • Definition of Fault Tolerance • State Machines • Synchronous Design Theory • Proposed Method of SEU detection • Proposed Method of SEU correction Berg

  3. Definition of Fault Tolerance • Masking or recovering from erroneous conditions in a system once they have been detected • The degree of fault tolerance implementation is defined by your system level requirements… I.e. what actually is acceptable behavior upon error • Questions that must be answered within the system requirements documentation: • Does your system only need to detect an error? • How quickly must the system respond to an error? • Must your system also correct the error? • Is the system susceptible to more than one error per clock cycle? Berg

  4. Synchronous Design with Asynchronous Events • This discussion focuses on sequential Single Event Upsets (SEUs) within a synchronous design environment. • The SEU is considered a soft (temporary) error which has occurred due to a DFF being hit by a charged particle. • Configuration or SRAM errors will not be considered • Although the design is synchronous, it is very important to note that the SEU is an asynchronous event… • Generally not taken into account • Metastability and unpredictable events can occur • Can invoke a SEFI Berg

  5. Common Fault Tolerant Implementation • Triple Mode Redundancy (TMR) is the most commonly implemented solution of SEU tolerance. • Why …. Because it is a very simple solution • In many cases it is not implemented correctly • Glitches within the TMR voting logic (due to mitigation across separate clock domains or hazardous combinational logic) must be taken into account incase a SEU occurs near a clock edge • TMR can be very area extensive Berg

  6. Glitches in TMR Circuitry: Example Berg

  7. Glitchy TMR Circuitry Continued Berg

  8. Proposed EDAC Methodology • Goal: The proposed EDAC techniques are: • Targeted for synchronous Finite State Machine Designs • Less area extensive than TMR • Glitch Free and synchronous: Reduces the rate of SEFI • Note: Synchronous Design techniques referred to in this presentation are derived from the ASIC industry and are implemented using HDL… • DFF data inputs should not change within the setup and hold of the DFF: Metastability and unpredictable functionality will occur • Within a synchronous design, metastability will only happen at clock domain crossings…Must use metastability filters (synchronizers) to protect against these Asynchronous events • Synchronous design theory minimizes clock boundary crossings • This is a challenge when SEUs can occur at any point in time anywhere in the circuit Berg

  9. Synchronous State Machines • A Finite State Machine (FSM) is designed to deterministically transition through a pattern of defined states • A synchronous FSM utilizes flip-flops to hold its currents state, transitions according to a clock edge and only accepts inputs that have been synchronized to the same clock • Generally FSMs are utilized as control mechanisms • Concern/Challenge: • If an SEU occurs within a FSM, the entire system can lock up into an unreachable state: SEFI!!! Berg

  10. Synchronous State Machines • The structure consists of four major parts: • Inputs • Current State Register • Next State Logic • Output logic Berg

  11. Encoding Schemes • Each state of a FSM must be mapped into some type of encoding (pattern of bits) • Once the state is mapped, it is then considered a defined (legal) state • Unmapped bit patterns are illegal states Berg

  12. Encoding Schemes Berg

  13. Safe State Machines??? • A “Safe” State Machine has been defined as one that: • Has a set of defined states • Can deterministically jump to a defined state if an illegal state has been reached (due to a SEU). • Synthesis tools offer a “Safe” option (demand from our industry): TYPE states IS ( IDLE, GET_DATA, PROCESS_DATA, SEND_DATA, BAD_DATA ); SIGNAL current_state, next_state : states; attribute SAFE_FSM: Boolean; attribute SAFE_FSM of states: type is true; • However…Designers Beware!!!!!!! • Synthesis Tools Safe option is not deterministic if an SEU occurs near a clock edge!!!!! Berg

  14. Binary Encoding: How Safe is the “Safe” Attribute? • If a Binary encoded FSM flips into an illegal (unmapped) state, the safe option will return the FSM into a known state that is defined by the others or default clause • If a Binary encoded FSM flips into a good state, this error will go undetected. • If the FSM is controlling a critical output, this phenomena can be very detrimental! • How safe is this? Berg

  15. Safe State Machines??? Berg

  16. One-Hot vs. Binary • There used to be a consensus suggesting that Binary is “safer” than One-Hot • Based on the idea that One-Hot requires more DFFs to implement a FSM thus has a higher probability of incurring an error • This theory has been changed! • Most of the community now understands that although One-Hot requires more registers, it has the built-in detection that is necessary for safe design • Binary encoding can lead to a very “un-safe” design Berg

  17. Proposed SEU Error Detection: One-Hot • One-Hot requires only one bit be active high per clock period • If more than one bit is turned on, then an error will be detected. • Combinational XNOR over the FSM bits is sufficient for SEU detection… even if a SEU occurs near a clock edge • A MUX can be used to transition the current state into a defined “ERROR STATE” if the parity check fails • If the system can not receive Multiple Event Upsets within one clock period, then the circuitry can never flip into a legal state (illegally)! Berg

  18. FSM SEU: Error Correction : Using Companion States • There exists many publications on Error Correction theory. • None directly address how to correctly implement FSM fault correction while using current day synthesis tools. • Glitch control: Generally synthesis tools will produce “glitchy” logic • Synthesis “optimization” algorithms will erase the necessary redundancy for EDAC • The user must sometimes hand instantiate logic • The user must place the necessary attributes to avoid redundant logic erasure. Berg

  19. Error Correction within One Cycle: Using Companion States • We’ll base the derivation off of a 4 state FSM: Berg

  20. Error Correction within One Cycle: Using Companion States • 1.Find an encoding such that the states have a hamming distance of 3 (at least 3 bits must be different from state to state)... • 00000 (state-A),  • 11100(state-B),  • 01111(state-C), • 10011(state-D). • Five bits are necessary to encode a four-state machine in order to achieve the required hamming distance of three.    Berg

  21. Error Correction within One Cycle: Using Companion States • For each encoding, calculate the companion encodings such that the hamming distance is one… for example: • Companion encoding for state A (00000) is: • 00001,00010,00100,01000,10000 • Companion encoding for state B (11100) is: • 11101,11110,11001,10100,01100 Berg

  22. Error Correction within One Cycle: Using Companion States • When implementing the state machine, state A is encoded as 00000 and then (theoretically) “OR-ed” with all of its companion encodings. This covers all possible SEUs • Do the same for all other states • Use the output of the “OR-ed” states to determine next state logic. • Thus if a bit flips… the companion state will catch it and the FSM will be able to correctly determine the next state • Be careful! The “OR” logic is more complex than simply using a string of “OR” gates. Berg

  23. Error Correction within One Cycle: Glitch Control • One major issue that is extremely overlooked is SEUs occurring near clock edges • If this occurs, your error checking logic may cause a glitch • Due to routing timing differences, this can cause incorrect values to be latched into the current state registers. • Refer to a Karnaugh Map for glitch-less implementation • The designer may have to hand instantiate the logic if the synthesis tool does not adhere to the VHDL as expected Berg

  24. Error Correction within One Cycle: Glitch Control Berg

  25. Error Correction within One Cycle: Glitch Control • The designer will have to include the synthesis directives in order to turn off the tools “optimization”: • Preserve_driver • Preserve_signal • Always check the gate level output of the synthesis tool. Berg

  26. Conclusion • This presentation proposes methods of Fault Tolerant State Machine implementation due to potential IC SEU susceptibility. • Be aware of potential glitches due to asynchronous SEUs occurring near a clock edge… • Mitigation Techniques must be Glitch Free! • Mitigation may need a synchronization circuit • Due to metastability and routing delay differences, can be more catastrophic than expected • Special directives must be used in order to drive the synthesis tools when implementing fault tolerant redundant logic because the tools are generally focused on area and speed optimization. Berg