1 / 21

Joel Seely Technical Marketing Manager Military & Aerospace Business Unit

Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection. Joel Seely Technical Marketing Manager Military & Aerospace Business Unit. Single Event Upset (SEU) Overview for SRAM-Based FPGAs. Definitions. SEU: Single Event Upset

Download Presentation

Joel Seely Technical Marketing Manager Military & Aerospace Business Unit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel SeelyTechnical Marketing ManagerMilitary & Aerospace Business Unit

  2. Single Event Upset (SEU) Overview for SRAM-Based FPGAs

  3. Definitions • SEU: Single Event Upset • Unwanted Change in State of a Latch or a Memory Cell • SER: Soft Error Rate • SEU Rate • SEFI: Single Event Functional Interrupt • Functional Failure by SEU • Not All SEUs are SEFIs • Generally Takes 5-10 SEUs to Cause SEFI

  4. Circuit Components of SRAM-Based FPGAs • I/O Registers & I/O Configuration • No Issue, Very Robust Registers, < 1 FIT • Logic Registers (LEs) • No Issues, Very Robust Registers, < Hard Error Rate • User Memory • Typically On-Chip Memories are “By 9” for Parity Checking • IP Available for ECC • Configuration RAM (CRAM) for LUTs & Routing • Area of Focus

  5. Upset of a CRAM Cell Voltage Voltage Time Noise Current for 10fC Collected Charge Vcc 200 Add Time 150 Data In Data Out Current (µA) 100 Clear 50 0 0 50 100 150 200 Vss Time (ps) 6 Transistor Cell

  6. SEU Induced Failure Rate* * Data at Sea Level **MTBF: Mean Time Between Functional Interrupt

  7. Number of CRAM Bit Upsets for Each Occurrence of Functional Upset Median ~6 Median 5

  8. AddressingSystem-Level Issues

  9. SER Improvements/Mitigation • Chip Design Enhancements • New Materials & Process Enhancements • Larger CRAM Structure • Increase in Capacitance on Critical Node • Smaller Process => Smaller Die => Lower SEU Probability • Built-In Error Detection/Correction Circuitry

  10. SER Per SRAM Bit Trend 1,000 FITS SER per SRAM MBit 90 nm Projection 100 FITS Process Technology Year 0.5 µm 1995 0.13 µm 2002

  11. System Level Improvements Mitigation • ECC for User Memory • Use Detection/Correction Feature • Triple Module Redundancy (TMR) • To Achieve Lower Error Rate & Less Downtime • Migrate to Structured ASIC

  12. Soft Error Detection Methods • Configuration RAM Readout • Read-Out Full Bitstream • Compare with Stored Bitstream • Can Determine where in Configuration Error Occurred Caveat: Security Issues with Reading Out Bitstream Stored CRAM Data FPGA Microprocessor or CPLD Same or Different?

  13. FPGA Stored Value = Computed Value To Core Soft Error Detection Methods • On-Chip SEU Detection • Dedicated Comparison Circuitry • e.g. CRC Engine Comparing Stored CRC with That Calculated from Configuration RAM • Detection Circuitry Running Continuously • Error Detection Rate Variable Based on Implementation of Hardware, Number of CRAM Bits & Input Clock Frequency • Error Signal Available Internally or Externally Caveat: Cannot Determine Where in Configuration Error Occurred

  14. On-Chip Detection Example • Dedicated CRC Circuit • Configuration RAM Verification Capability • 32-Bit Cyclic Redundancy Code Check • Verified Against Internally Stored Value • Runs in the Background Without Impacting Device Performance • Close to Real-Time Detection • Variable Clock Frequency • Depends on Number of CRAM Bits • Multi-Event Detection • Up to 3-Bit for 32-Bit CRC • Result Output to Either Core or Pin • Use with Either Internal or External Hardware for Error Correction

  15. Correction Methods • FPGA Detection, System-Level Correction • Lower Total Cost • Downtime Is Limited & Manageable • Used in Non-Critical Applications • Triple Module Redundancy • Two Flavors • All On-Chip in FPGA • Separate Chips & Voter • Correction Can Be Real-Time • Used in Critical Applications

  16. Single System Detection & Correction • Step One: Detect the Soft Error • 75% of Reported Errors Are “Don’t Care” Errors • Step Two: Alert the System • Step Three: Fix the Error • In Some Cases, Re-Program the FPGA • In Some Cases, Reboot the Sub-System • In Some Cases, Reboot the System • Need to Focus on System “Downtime” • Each System Has Unique Requirements • Re-Programming FPGA Takes < 250 ms • Rebooting Time Varies & Can Be Fast “by Design”

  17. FPGA Hardware1 FPGA Hardware 2 FPGA or CPLD (Voting) FPGA Hardware3 TMR Method 1 • Identical Hardware in FPGAs • Use Voter Implemented in FPGA or CPLD • Utilize Either Hardware Output or CRC Error Pin • Voter Also Used to Signal Reconfiguration on Difference or Error

  18. Hardware 1 Hardware 2 Voting Circuit Hardware 3 FPGA TMR Method 2 • Multiple Instantiations of Hardware in Single FPGA • For Low-Rate SEUs • SEU Events May Occur Much More Frequently than Functional Error (De-Rating) • Voter Signals Reconfiguration of FPGA • FPGA Must be Reconfigured

  19. De-Rating Methodology • Only a Fraction of Configuration Bits Are Actually Programmed • e.g. Using Only Two Inputs of 4-Input LUT Leaves 75% of LUT as “Don’t Care” • Only About 20% of Routing Is Used • Depends on Utilization & Application • Some Un-Programmed Bits Still Matter • Flipping Could Change Function of the Device • Extensive Experimentation Shows a Range From 1/8 to 1/3 of the Bits Matter

  20. Structured ASIC: Ultimate SEU Protection PLD Architecture with ASIC Routing FPGA Structured ASIC No Configuration Memory = Estimated SER is below Hard Failure Rate for the Device

  21. Summary • SEU is a Well Understood Phenomena • Many Chip Level Enhancements Mitigate SEUs • Process • Design • Manufacturing Techniques • Easy Detection of SEU Events is Key • After Detection, Other Methods Must be Employed to Deal with the Event • Critical Nature of Application Determines Level of SEU Response • Structured ASICs from FPGA Designs Offer a Much More Robust Solution Due to Removal of All CRAM

More Related