1 / 33

Online Timing Variation Tolerance for Digital Integrated Circuits

Online Timing Variation Tolerance for Digital Integrated Circuits. Guihai Yan & Xiaowei Li State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS). Sources of timing variation. PVT variation

nguyet
Download Presentation

Online Timing Variation Tolerance for Digital Integrated Circuits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Timing Variation Tolerance for Digital Integrated Circuits Guihai Yan & Xiaowei Li State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)

  2. Sources of timing variation • PVT variation • Dynamic: Voltage & Temperature fluctuations • Static: Process variation • Aging degradation • NBTI, PBTI • TDDB • Soft errors (in non-regular logics) • SEU & SET

  3. Process variation • Sub-wavelength Lithography • “What you get is not what you want” • Systematic • Random dopant fluctuations • Vth variation • Random P variation is time-independent, “DC component” Max Freq. differentiate by 20% ! [Teodorescu, ISCA’08]

  4. Temperature variation • Application-specific • Slow-varying • Milliseconds • Typical thermal constant : 2ms [Donald, ISCA’06] T variation is slow-varying, “Low-frequency components”

  5. Voltage variation • Fast-changing • Inductive noise • a.k.a. L(di/dt) problem • IR-drop Why it is harder to keep a constant voltage level ? Example: Power budget: 100W, Working voltage: 1V, Current: 100A, To keep voltage fluctuation between ±5%, RPDN < 0.5 mOhm V variation is fast-changing, “High-frequency components” PDN hierarchy model

  6. Aging degradation • Aging mechanisms • NBTI (PMOS) • PBTI (NMOS) • TDDB Failure rate Aging Infant mortality Useful time Lifetime 20%degradation 10years

  7. Soft errors • SEU (Single Event Upset) • Unintentional bit-flip in storage cells • SET (Single Event Transient) • Transient voltage pulse propagating in combinational logics SEU SET

  8. Outline • TEA-TM • Timing emergency-aware thread migration • PVT variations co-optimization • SVFD • Stability violation based fault detection • On-line fault detection via timing sensing • Delay fault, aging delay, soft errors • MicroFix • Margin-reducing with timing sensing • Application to DVFS • ReviveNet • Aging-delay tolerance

  9. TEA-TM: Timing Emergency-Aware Thread Migration • Focus on the essential Timing issue Process variation Voltage variation Temperature variation Timing variation (+, -) (+, -) (+, -) Not Necessarily aggregated, but can cancel off each others in some cases. Hence,“Complementary”.

  10. Some terms Timing Emergency Delay • Timing emergency (TE) • Emergency level (EL) • “Density” of TE • Define: EL = # of TE per 100 millions cycles Threshold Time Voltage Temperature Process Slow corner Violent Large fluctuation Hot Mild Fast corner Small fluctuation Cool

  11. How PVT Variations Complement each other ? • Observation in time domain Excessive headroom T. Mild, V. Mild Threshold Core1: Delay  T Mild, V Mild T Mild, V Violent Large margin, low EL Time Emergency T. Violent, V. Violent Delay Core2: T Violent, V Mild T Violent, V Violent  Little margin, High EL Time What if exchange the threads on Core1 and Core2? Mild + Violent

  12. Frequency domain analysis Migrate threads = “Graft”V component

  13. Frequency domain analysis (cont.) • Relative frequency spectrum deviations on 2GHz quad-core processor. • P: 0-100Hz, T: 100Hz-1MHz, V: 1MHz-250MHz. • Potential • Core3 and Core4 are mild • Strategy • exchange threads on Core1 and Core4, Core2 and Core 3

  14. TEA-TM Summary • Analyzing the complementary effect • from both time and frequency domain • Presenting a delay sensor-based scheme (TEA-TM) to exploit the complementary effect • Simple, cost-efficient • FFT-like heuristic Throughput: 30% Fairness: 80% Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan, Xiaoyao Liang, Yinhe Han, Xiaowei Li, In the Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10), Saint-Malo, France. pp.485-496, Jun. 2010.

  15. Stability Violation • Stable Period vs. Variable Period Stability Violation: Signal transitions occur in Stable Period.

  16. In what situations would SVs occur? • Delay faults resulting from • Delay defects (introduced in manufacturing processes) • Aging (Wearout) induced performance degradation Setup time Setup time violation Due to Delay Fault T T • Thus, delay faults caused stability violation do not differ too much from “setup time violation” YES! • But, Can soft error be modeled by SV?

  17. How do Soft Errors cause SV? Si violates Stability Requirement! SEU SET So violates Stability Requirement! Notice: NOLY the SVs occurring in “vulnerable window” --- within which the flip-flops are updated --- could cause failures.

  18. The next problem is Delay faults and soft errors can be modeled as Stability Violations. • How to detect stability violations? • Low cost stability checker

  19. Some Rresults • Implementation • SVFD-protected FPU • Using 65nm PTM, Hspice Simulation • A Unified Online Fault Detection Scheme via Checking of Stability Violation • Guihai Yan, Yinhe Han, Xiaowei Li, • IEEE/ACM Desing, Automation and Test in Europe (DATE’09), pp.496-501, 2009. • SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation • Guihai Yan, Yinhe Han, Xiaowei Li, • IEEE Transactions on Very Large Scale Integration Systems (T-VLSI), 19(9), Sep. 2011.

  20. Besides of fault detection, what else can we do with SVFD? • Dynamic margin reduction • MicroFix: an application to DVFS • Aging tolerance • ReviveNet: Fine-grained aging delay tolerance

  21. Dynamic margin reduction • Timing sensors setup

  22. Operational Principles

  23. Fine-grained margin exploited Localized timing imbalance Generous Flip-flop (GFF) Forward Adaptable Flip-flop (FAFF) Backward Adaptable Flip-flop (BAFF) Unadaptable Flip-flop (UAFF)

  24. Case study results • Apply to a FPU • 32nm PTM models TH=0.2~0.3 is an optimal choice! Efficiency Improvement: 35% EDP MicroFix: Using Timing Interpolation and Delay Sensors for Power Reduction Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei Li, ACM Transactions on Design Automation of Electronic Systems (TODAES), 16(2), 1-21, 2011. MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, YinheHan, Hui Liu, Xiaoyao Liang, Xiaowei Li, ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED’09), pp395-400, 2009.

  25. Localized Aging Tolerance • The chance for aging adaptation • We have chance to “act before it’s too late”

  26. Nudge for timing margin • Dynamic time borrowing • Path-grained, NOT stage-grained

  27. Aging sensors setup • Coarse-grained detection

  28. Trail-based adaptation • Fine-grained adaptation Adaptation latency is non-critical Trail till success

  29. Implementation • False-alarm filter • Sharing filters to reduce overhead ReviveNet: A Self-adaptive Architecture for Improving Lifetime Reliability via Localized Timing Adaptation Guihai Yan, Yinhe Han, Xiaowei Li, IEEE Transactions on Computers (TC), 60(9), Sep. 2011.

  30. Conclusion • Dynamic timing variation is increasingly critical • Online timing variation detection and tolerance is a promising approach to dynamic variation • Application-specific timing variation • MicroFix for DVFS • ReviveNet for aging tolerance • Holistic solution can be more cost-effective • TEA-TM • Architectural optimization for Circuit symptom

  31. Publication (Chronological order) • Guihai Yan, Yinhe Han, Xiaowei Li, ReviveNet: A Self-adaptive Architecture for Improving Lifetime Reliability via Localized Timing Adaptation, IEEE Transactions on Computers (TC), Vol.60, No.9, pp.1219-1232, Sep. 2011. • Guihai Yan, Yinhe Han, Xiaowei Li, SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation, IEEE Transactions on Very Large Scale Integration Systems (T-VLSI), Vol.19, No.9, pp.1627-1640, Sep. 2011. • Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei Li, MicroFix: Using Timing Interpolation and Delay Sensors for Power Reduction, ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol.16, No.2, pp.1-21, Mar. 2011. • Jianbo Dong, Lei Zhang, Yinhe Han, Guihai Yan, Xiaowei Li, Performance-asymmetry-aware Scheduling for Chip Multiprocessors with Static Core Coupling, Journal of Systems Architecture, Vol.56, pp.534-542, 2010. • Guihai Yan, Xiaoyao Liang, Yinhe Han, Xiaowei Li, Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors, In the Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10), Saint-Malo, France. pp.485-496, Jun. 2010. • Guihai Yan, YinheHan, Hui Liu, Xiaoyao Liang, Xiaowei Li, MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency, ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'09), pp.395-400, 2009. • Song Jin, Yinhe Han, Lei Zhang, Huawei Li , Xiaowei Li and Guihai Yan, M-IVC: Using Multiple Input Vectors to Minimize Aging-induced Delay, Proc. of IEEE Asian Test Symposium (ATS'09), 2009. • Guihai Yan, Yinhe Han, Xiaowei Li, A Unified Online Fault Detection Scheme via Checking of Stability Violation, IEEE/ACM Desing, Automation and Test in Europe (DATE'09), pp.496-501, 2009. • Guihai Yan, Yinhe Han, Xiaowei Li, Hui Liu, BAT: Performance-Driven Crosstalk Mitigation Based on Bus-grouping Asynchronous Transmission, IEICE Transactions On Electronics, Vol.E91-C, No.10, pp.1690-1697, Oct, 2008.

  32. Book Chapters • Fault Tolerance Designs for Digital Integrated Circuits: Tolerating defects/faults, parameter variations, and soft errors (in Chinese), Beijing, Science Press, 2011. ISBN 978-7-03-030576-3.

  33. When I’ve done a program…

More Related