Combination of Multiple Mechanism for Post-Silicon Reliability Prediction

Combination of Multiple Mechanism for Post-Silicon Reliability Prediction April 30, 2014 Joseph B. BernsteinOfirDelly, MotiGabbay Ariel University Yizhak Bot (BQR) josephbe@ariel.ac.il

We always try learning from the past in order to improve the Future. One Problem….. Everyone sees the past differently !

“It is possible to fail in many ways...while to succeed is possible only in one way…” Aristotle If We don’t learn from the past, We are condemned to repeat it…George Santayana, 1952

So, What’s the big problem ??? Why is Lifetime PREDICTION so difficult ???

The Semiconductor Test Industry Today We test the parts “blindly” and then “see how they run…”

Field Data Analysis Results Cumulative data for over 10,000,000 Military Electronic Systems MTBF Region = 1 ± .2 for all systems Field Failures are generally Constant Rate Occurrences, Beta = 1 is Poisson. Physics of Failure So, we should keep MTBF and FIT

Some Observations: • Modern Electronics have nearly constant failure rate • Few (very rare) exceptions • Keep the idea of Constant Rate and work within the framework of Failure-In-Time (FIT)

So what’s the problems with FIT ? • Handbooks are Pretty outdated • MIL 217 is OLD and USELESS. • FIDES is updated but only applies a single mechanism approach. • Physics of Failure (PoF) approach looks to TTF and not FIT. • Probabilistic DfR requires unique distributions for each mechanism. • HALT/HASS cannot predict l.

JEDEC Publication JEP 122G Rev. Oct. 2011 I Bet You didn’t know JEDEC says this: • 2 Terms and definitions (cont’d) • quoted failure rate: The predicted failure rate for typical operating conditions.(This is the FIT) • NOTE: The quoted failure rate is calculated from the observed failure rate under accelerated stress conditions multiplied by an accelerated factor; e.g….. • “ When multiple failure mechanisms and thus multiple acceleration factors are involved, then a proper summation technique, e.g., sum-of-the-failure rates method, is required.”

Semiconductor Industry ‘Joke’ The Magical Mysterious Decreasing FIT Intel Maxim PLX 1 FIT = 1 Failure per 10,000 parts in 12 years. If ONLY this were true!

Measured Component FIT () vs. Year Produced Field Return Data 45-22 nm : ??? ACTUAL Failures per Billion Part-Hours 90 nm : ~ 150 - 300 FIT 65nm:~ 300-450 FIT 130 nm : ~90-120 FIT 0.25 m : ~20-50 FIT Avionic and Military Expectation ! • Compared to previous avionic system data, the trend continues at a much greater than expected rate. • Bernstein’s Law: ~10x increase in FIT every 10 years

Benefits to Accurate Prediction !! More applications means more Sale$ Performance is Designed for a required Reliability specification 1. X Reliability Suggestion: Two products; One design More customers for the same Design A small reduction in performance can bring a huge gain in reliability (illustrative) X 2. Performance Multiple Accelerated Test Matrix for Reliability Prediction

Performance vs. Reliability Why not operate here? • I could double the speed for freeIfI KNOW the reliability, maybe I CAN improve performance !?!?

Qualification TODAY Industry ‘Standard’ FIT (failures in time) model: Acceleration Factor (AF) is the product of Voltage and Temperature acceleration factors. 3 KILLERproblems: This does NOT fit with KNOWN failure models. When ZERO failures are reported, there is NO statistical meaning to the acceleration factor. Uncertainty is assumed for 0/1 fails, while AF has ZERO uncertainty; no accounting for error in AF !!

Multiple Mechanisms Are Here to Stay • Traditional Reliability approach fails to predict Field Failures. • Modeling, Simulation and Acceleration alone will NOT yield true results without Accurate Failure Analysis. • HOWEVER: We CAN model and PREDICT Failure Rate under Known Conditions with a more complete picture of the mechanisms ???

Multiple Mechanisms Don’t Add Up !!! + Single Mechanism Model: • AFsystem = AFThermal* AFElectrical • So, 1/MTTFuse = 1/(MTTFtest *AFMM) Multiple Mechanism Model: • 1/MTTFuse = P1/(MTTFtest *AFmech1) + P2/(MTTFtest *AFmech2) • Therefore, the effective AF for multiple mechanisms is: AFMM = 1 P1 P2 AFmech1 AFmech2 The True acceleration factor is the SMALLER one, not the one which exposes a failure at accelerated test.

Traditional Methodology Single Mechanism Model (old JEDEC Standard): • 77 Devices tested for 1000 hours with 0 failures… For Example: AFT = 100 and AFV = 130 AFS= 100*130 = 13000 !! Zero failures at High V and High T Assume 1 failure after 1000 hours: Thus FIT: 109 / (77 * 1000 * 13000) = 1 FIT !! NICE! Now, we have done a great job and can go home and celebrate our success !!! NOT !!!

The Reality of Multiple Mechanisms BUT….Multiple Mechanisms Compete ! Same Example: AFV from HCI and AFT from EM • EM has Ea = 1 eV and voltage g ~ 1. • HCI has Ea ~ 0 eV and voltage g ~ 14 NOW, AFS = 2/(1/100 + 1/130) = 163 So our correct calculation for the same data: FIT: 109 / (77 * 1000 * 163) = 113 FIT !! This is compared to 1 FIT based on HTOL. Traditional FIT is ALWAYS too low as compared to considering multiple mechanisms

Failure Rate Estimation at System Level New System Reliability ModelReplacement Program (collaboration) Nth Component FM3 FM1 FM2 Base Failure rate can be determined at various accelerated conditions in order to normalize the matrix and make physics based reliability assessment from test data combined with knowledge of the application Each component is comprised of several sub-components in proportion to their function and relative reliability stress.

FIXtress™ : A MORE ACCURATE FIT • ~ S(1/MTTF1+1/MTTF2+…+1/MTTFn) • The manufacturers have the data, we can make the prediction (BQR Software Tool) ! Calculated PDF (FIT) l λTDDB λHCI λNBTI λEM λPackage 2 4 6 8 10 12 Time to Fail (years)

Our Guiding Principle: “It is better to be roughly right than precisely wrong.” ― John Maynard Keynes

Post-Silicon Test Strategy How can we match data from reliability Models with experimentally obtained AF from HTOL?PROPOSAL: Run Multiple Tests at different conditions while monitoring degradation. AF from Burn-in at different T, V Physics of Failure Models (JEDEC) Matrix solution can match

Our New Approach (ARIEL) JEDEC or TSMC Physics models MTBF / FIT DOE Burn-In  RelativeAF Relative MTBF/FIT Input Input 24 failure mechanisms over 4 categories Rel. AF λTDDBλHCI λBTI λEM TDDB T1,V1 = T2,V2 Input HCI T3,V3 X T4,V4 BTI System (TEST) measurements DPPM per Fmax limit (real FIT at V, T test) EM Matrix solution Proportionality parameter X Output Reliability solution: FIT, DPPM

Contributions from JEDEC Models Different Dominant Mechanism at each test condition

HTOL is OVERWHELMINGLY measuring only TDDB • This is very convenient when Zero failures arise during the 1000 hour HTOL test. • Foundries design the gate oxides very well so there WILL be NO TDDB failures during HTOL testing. • 3 other mechanisms are just ignored during final test and qualification.

Separation of Mechanisms • Failure Mechanisms can be separated by properly selecting test conditions. • High Voltage and Low Voltage tests EM • High Temperature and High Voltage tests for NBTI and for TDDB • Low Temperature and High Voltage tests for HCI

Two Distinct Mechanisms ! • HCI frequency dependence • See at LOW T and High V • NBTI No Freq. dependence • Seen at High T and High V -35°C 2.4 V 140 °C 2.4 V F(MHz) F(MHz) Note: -35°C has >2.5X failure rate as at 140°C for the same Voltage !!

TDDB from NBTI Neg. Bias-Temperature Instability (NBTI) Time-Dependent Dielectric Breakdown (TDDB)

Prediction for 28nm 2 GHz 1.5 1.0 0.5 0.1 Dominant Mechanisms are EM and BTI, so strong T and Freq. dependence but weak V dependence.

Observation • Increase voltage by 20% • Increase performance by 20% • Increases FIT by only factor of 2 • Increased customer satisfaction • Increased sales for FREE !!!

Main Observations • Dominant Mechanism at HTOL test is Never the dominant mechanism at USE conditions • Acceleration Factor based on 1 mechanism model Significantly Overestimates Reliability • Foundry models today are quite sophisticated and consider N- and P-MOS based on their own data AND companies trust these models. • The chip companies WANT to consider the true contributions of EACH mechanism.

Conclusions We have developed a prediction model that is based on 4 failure mechanisms Our model is more accurate than the single failure model currently in use Collaboration with Industry is Necessary to Verify our Models and to keep pace with advancing technology

Thank You

Combination of Multiple Mechanism for Post-Silicon Reliability Prediction

Combination of Multiple Mechanism for Post-Silicon Reliability Prediction

Presentation Transcript

MULTICOM – A Combination Pipeline for Protein Structure Prediction

Earthquakes mechanism and prediction

Multiple Intelligences and Technology: A Winning Combination

Post Silicon Test Optimization

Probabilistic Combination of Multiple Modalities to Detect Interest

IEEE Reliability Society of Silicon Valley

BackSpace: Formal Analysis for Post-Silicon Debug

Statistic for Combination of Results from Multiple Gravitational-Wave Searches

Reliability Aware Through Silicon Via Planning for 3D Stacked ICs

Reliability Prediction

Bayesian Framework for Reliability Prediction of Component Based Software Systems

Reliability Prediction of a Return Thermal Expansion Joint

An Effective Combination of Multiple Classifiers for Toxicity Prediction

A Multiple Inheritance Mechanism for PyTango Device Classes

Reliability Analysis and Prediction of Wind Turbine Gearboxes

WS - DREAM : A Distributed Reliability Assessment Mechanism for Web Services

Pre Silicon to Post Silicon Verification

Post-Silicon Debugging of Transactional Memory Tests

Reliability Prediction

BackSpace: Formal Analysis for Post-Silicon Debug