1 / 19

Actel FPGAs: Overall Experience

Actel SX-S FPGAs: Mission- and Safety-Critical Systems A Summary and Snapshot of a Dynamic Situation.

yeva
Download Presentation

Actel FPGAs: Overall Experience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Actel SX-S FPGAs: Mission- and Safety-Critical SystemsA Summary and Snapshot of a Dynamic Situation Note: This version is a subset of what was presented at the seminar on September 22, 2004 and consists solely of released material. A more complete version, in the form of a report, is in the process of being prepared for release. There were only a few edits made to this presentation.

  2. Actel FPGAs: Overall Experience • In use in NASA spaceflight electronics for > 10 years • Investigated dozens of failures over that period • Devices have had reliable structures • Device idiosyncrasies were found • Design errors common • Application of FPGA at the board level • Internal design of FPGA logic • Trend has been a reduction of internal logic errors and an increase in board application errors

  3. Application Challenges  Failures • Leading cause of failure: lack of knowledge of and incorrect management of clock skew • Incorrect termination of special pins • JTAG pins misunderstood and misapplied; bad IEEE spec does not help • Hazards or “glitchy” circuits used as clocks • Incorrect decoding of counters; multiplexors/decoders switching • Lack of knowledge of device startup characteristics • Inputs can function as outputs: POR circuits • Device uncontrolled during start: SMEX/WIRE, SX-S “burps” • Lack of knowledge of design produced by logic synthesizer • Lockup states in FSM • Flip-flop replication • Power sequencing for dual supply devices • SX 0.6 µm devices; potential damage • SX-S 0.25 µm devices; large inrush currents • Others

  4. SX-A/SX-S Clock Skew“OLD News #13,” July 15, 2003 • Initiated by A54SX32A failure (CPU_SIM), December 2002 • Minimum delay numbers calculated by the timing analysis tools are not guaranteed. • Any flip-flop pair, with a common edge, is guaranteed to have sufficient hold time margin under all conditions and placements, when clocked by HCLK. • For an arbitrary flip-flop pair, with a common edge (either rising or falling), when clocked by a global routed array clock: • No guarantee that it will be correct by construction under all conditions and placements. • There is no certified technique to prove adequate margin by analysis with the current tool set. • Skew-tolerant design techniques are recommended. • For an arbitrary flip-flop pair, with a common edge (either rising or falling), when clocked by a quadrant routed array clock: • Within a single quadrant: Any flip-flop pair, with a common edge (either rising or falling), is guaranteed to have sufficient hold time margin under all conditions and placements. • Over multiple quadrants: • When using internal routing to connect quadrant drivers: There is no guarantee that circuits will be correct by construction under all conditions and placements. • When using dedicated routing to connect quadrant drivers: There is currently no guarantee that circuits will be correct by construction under all conditions and placements. This case is currently being analyzed and tested by the manufacturer.

  5. SX-A/SX-S Clock Skew“Clock Skew and Short Paths Timing,” March 2004 • The hardwired and quadrant clock resources (HCLK, QCLKA, QCLKB, QCLKC and QCLKD) in A54SX-A, RT54SX-S … are designed to prevent design errors due to clock skew. The skew will always be less than the shortest possible data path. • Routed global networks, such as CLKA and CLKB, offer reduced clock skew. However, Actel recommends adding design margin since all possible configurations of routing and clock loading cannot be accurately characterized. • … Since great emphasis is placed on satisfying the maximum delay values, manufacturers do not typically specify the minimum delay values. In other words, manufacturers will guarantee devices to be faster than X, but do not guarantee that devices will be slower than Y. • … the best ways to avoid functional errors are to use design techniques that do not depend on the minimum delay values for proper functionality OR use dedicated hardwired resources such as the HCLK network in SX-A and RTSX-S devices.

  6. Example RTSX-S Startup “Burp” TTL, pull down resistor, VCCI only

  7. Speed, Size, and Complexity TrendsActel Only for This Presentation • Speed Issues • Signal Integrity • Simultaneous Switching Issues • Higher Drive I/Os • Increased Number of I/Os • Size: Increased From ~2k to ~36k Gates • Complexity • Basically Constant • Significant Increase for Next Generation Devices

  8. Basic Antifuse Types • ONO: Oxide Nitride Oxide (~ 250 Ω) • A1020 (MEC) and RH1020 (L-M) • A1280A (MEC) and RH1280 (L-M) • A1460A (MEC) and A14100A (MEC) • Workhorse Devices • M2M: Metal to Metal (~ 25 Ω) • SX: 0.6 µm, 3.3V devices, MEC • SX16, SX32: (short lived) • SX32S, SX72S: 0.25 µm, 2.5V devices, MEC • SEU Hard Latches • Used everywhere • SX32SU, SX72SU: 0.25 µm, 2.5V devices, UMC • Upgrade from SX-S (inrush current fix, etc.). • New Device

  9. SX-S: Widely Used FPGA • MESSENGER (Mercury Orbiter) • Cockpit Avionics Upgrade (Shuttle) • STEREO • SWIFT • GLAST • MER • Department of Defense • Etc., etc., and etc.

  10. Initial Reported Failures • Mars Exploration Rovers and NASA IAT • 16 Reported Failed Out of ~ 75 Devices • Failures Reviewed • Induced by Faulty Test Equipment (ATE, Burn-In Boards) • Incorrect Test Results Reported (ATE) • No Failures • One Programmed Antifuse Failed • Failed to Program on First Attempt: No linkage to failure • Electrically Overstressed: Suggested linkage to failure • Recommended Completion of Failure Analysis • DoD Applications • High SSO • Large amounts of over and undershoot • Increased tPD ranges from ~ 10 ns to > 1 µs

  11. Are Damaged Antifuses Detectable? • Not Always • Parametric Change, Not Hard Functional Failure • No completely open programmed antifuses has yet been detected • Current draw not directly affected. • Delta tPD can be small enough to “hide in the timing slack.” • No suitable accelerating mechanism yet determined • Action probe can be exploited to detect some but not all damaged paths • Action probe pickoff point prior to antifuse

  12. Are Damaged Antifuses Stable? Frequency measurement of three devices with damaged programmed antifuses (MEC). Note the discontinuity, showing instability of the damaged element. Additional test data confirmed the instability of damaged programmed antifuses in a different S/N device.

  13. Testing Programs • Actel Corp. Internal Testing • “Industry Tiger Team” Testing • Boeing Leg • 500 MEC RTSX32S; Old programming algorithm; Room temp, VCCA = 2.5 VDC • Aerospace Leg (changes) • 600 MEC RTSX32S; Mix of old and new programming algorithms; Room temp, VCCA = varied (tough keeping track); low dynamic loads • rk Leg • 300 to 400 UMC A54SX32A • NASA Testing • 300 UMC RTSX32SU; 300 MEC RTSX32S • -55 °C and +125 °C; VCCA ≥ 2.75V; internal loading 20% > max

  14. Key Results, MEC SX-S • Old Programming Algorithm • Fully utilized RTSX32S; Roughly double for RTSX72S • High degree of observability • Room temperature testing, VCCA = 2.5 VDC • Low number of outputs switching • Approximately 6% to 7% failed (based on SX32S; double for SX72S) • Failure occurred > 300 hour point • Effects of overstress not clear • New Programming Algorithm • Reduces failure rate; analysis and testing in process • Modified new programming algorithm in development

  15. AFB D A A' ANQ A A A B A B B B C C C BFB B B' BNQ B A Q CFB C C' CNQ B A G SX-S TMRRedundancy will not mask damaged programmed antifuses Figure 3. K-Latch schematic, simplified. The asynchronous structure and interlocks eliminate the need for a free running clock to scrub SEUs.

  16. Recovery Plans • Test “modified new algorithm” when available • Rigorously test SX-SU/UMC, SX-S/MEC, and SX-A/UMC FPGAs • Cut stress levels in applications

  17. Conclusions • SX-S (and SX-A/MEC) FPGA’s programmed antifuses will fail under low-stress conditions • Not all damaged programmed antifuses can be detected • Damaged programmed antifuses are not stable • No damaged programmed antifuses have yet been detected in the SX-A/UMC FPGAs or SX-SU/UMC FPGA • SX-SU is a new part

  18. RecommendationsSafety-Critical Systems • These recommendations are valid today • Fluid situation • Lots more testing ahead • For FPGA-based systems: • test, Test, TEST; analyze, Analyze, ANALYZE • Each project will have their own considerations and decisions are project specific. OLD will work with all projects and there is no “one size fits all” general recommendation. • User logic design an inherent part of device reliability and many items must be verified: • Special pin termination • Clock skew management • Signal integrity • Etc.

More Related