1 / 23

Online Timing Analysis for Wearout Detection

Online Timing Analysis for Wearout Detection. Jason Blome, Shuguang Feng, Shantanu Gupta, Scott Mahlke University of Michigan. Wearout Mechanisms. There are a lot of them: Electromigration (EM) Time-dependent dielectric breakdown (TDDB) Negative-bias threshold inversion (NBTI)

belva
Download Presentation

Online Timing Analysis for Wearout Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Timing Analysis for Wearout Detection Jason Blome, Shuguang Feng, Shantanu Gupta, Scott Mahlke University of Michigan 1

  2. Wearout Mechanisms • There are a lot of them: • Electromigration (EM) • Time-dependent dielectric breakdown (TDDB) • Negative-bias threshold inversion (NBTI) • Hot carrier injection (HCI) • … • All highly dependent on temperature and current density • Both increasing fast! 2

  3. Goals of this Research • Low-cost reliable system design • How do physical wearout mechanisms progress • How to determine that a device has failed • How do we maintain operation given failed components 3

  4. Traditional and Recent Approaches • Traditional detection techniques expensive • Redundant checking structures • Predictive techniques • Canary circuits • RAMP 4

  5. Proposed Technique • Key Insight: • Degradation in silicon  decrease in performance • Long incubation time followed by rapid deterioration • Examples: • TDDB: increases leakage, shifting voltage curves • EM: increases resistance • NBTI: shifts threshold voltage 5

  6. Outline • Microprocessor model • Wearout simulation methodology • Wearout simulation results • The wearout detection unit (WDU) • WDU Analysis • Conclusion 6

  7. Simulation Setup 7

  8. Simulation Flow Step 1: Temperature and Activity Analysis Activity Trace Power Trace Temperature Trace Benchmark Synopsys VCS PrimePower HotSpot Netlist Timing Parasitics 8

  9. Synopsys VCS Benchmark Signal Latency Data Timing Age Index Wearout Simulation MTTF Calculation Netlist Temperature Relative Wearout Factors Activity Simulation Flow • Device Delay = Original Delay * RWF * AI * RV • RWF: Relative amount of wearout for a device • AI: Performance degradation parameterized by age • RV: Random variable Step 2: Wearout Simulation 9

  10. Simulation Flow Step 2: Wearout Simulation 10

  11. Signal Latency (ps) Sample Mean Latency (ps) Time (years) Wearout Simulation Results 11

  12. Exploiting Performance Degradation • Exponential moving average: • EMA = α(sample – EMAprevious) + EMAprevious 12

  13. Trend Analysis TRIX can be used to accurately track both local and long term latency trends 13

  14. 0 1 0 1 0 1 0 1 0 0 0 1 0 1 0 Wearout Analysis Circuit TRIXl Calculation 1 input signal 1 1 Latency Sampling Prediction TRIXg Calculation 1 1 14

  15. TRIXl Calculation TRIXg Calculation + System Integration 0 Latency Sampling Prediction 15

  16. Dynamic Variation • Temperature • 50oC  ~4% increase in latency at 130nm • Clock jitter • Impact on latency varies • Mean jitter typically modeled as 0 • Worst-case variation would need to be sampled 12 times over 4 days 16

  17. WDU Implementation 17

  18. WDU Prediction Results • Each unit calibrated for a 30 year MTTF • The WDU flagged at least one output from each module prior to the MTTF 18

  19. Lifetime Enhancement 19

  20. Conclusion • Low-cost reliable system design • Physical wearout mechanisms affect timing • Failure prediction can be much cheaper than detection • Wearout detection unit: • Online timing analysis a good detector of wearout, predictor of failure • Generic/self calibrating 20

  21. Simulation Results: Temperature and MTTF 21

  22. OR1200 Power Densities Technology Scaling • Quickly shrinking feature sizes • Sharp increase in frequency • Slow decrease in supply voltage 22

  23. 23

More Related