1 / 16

Analyzing Circuit-aware Microarchitectural Reliability

Analyzing Circuit-aware Microarchitectural Reliability. Taniya Siddiqua , Paul Lee taniya@cs.virginia.edu, pl4u@cs.virginia.edu University of Virginia, Charlottesville. Motivation. Transistor Size. Time. Transient Faults. Hard Errors (EM, TC, SM, TDDB, NBTI). 5%. Problem Description.

masao
Download Presentation

Analyzing Circuit-aware Microarchitectural Reliability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing Circuit-aware Microarchitectural Reliability Taniya Siddiqua , Paul Lee taniya@cs.virginia.edu, pl4u@cs.virginia.edu University of Virginia, Charlottesville

  2. Motivation Transistor Size Time Transient Faults Hard Errors (EM, TC, SM, TDDB, NBTI) 5%

  3. Problem Description • Architects focus on this problem at architecture-level granularity • Point of focus are architectural structures for e.g. caches, ALU etc. • Reliability predictions are circuit-agnostic • There is a potential gap between architecture and circuit level reliability estimation 10%

  4. Problem Description We : • Show that circuit-level granularity affects architecture-level granularity reliability simulations • Look into 2 hard-errors viz. NBTI (or Negative Bias Temperature Instability) and TDDB (or Time Dependent Dielectric Breakdown) at architecture and circuit level on ALU • Determine the effect of scaling of NBTI and TDDB on ALU up to 22nm technology • Propose a design of NBTI-aware ALU, which utilizes architecture as well as circuit-level optimizations 15%

  5. NBTI – A quick guide • Key reliability issue related to P-Channel MOS • Concerned with MOS devices stressed with negative gate voltages • Manifests as the threshold voltage increase and drain current decrease • Consequently the circuit slows down – timing constraint • Good News! -- Recovery starts as soon as stress is removed 25%

  6. Architecture-level Reliability Simulation We simulate: • 2-wide issue core having 2 INT ALUs • SimpleScalar 3.0 for modeling processor behavior • Wattch and HotSpot for simulating power and temperature behavior respectively • Estimate lifetime of 1st INT ALU • Lifetimes of ALUs are projected based on MTTF for NBTI 35%

  7. Circuit-level Reliability Simulation We : • Use Kogge-Stone adder circuit for ALU • Use average temperature of 1st ALU from architectural-level reliability simulation and feed to Cadence framework • Calculate stress and recovery time based on utilization pattern obtained from architectural-level reliability simulation • Calculate lifetime based on circuit-delay to be 25 % of original delay 45%

  8. Comparison of Approaches 50%

  9. Scaling Effect We : • Show scaling effect for 65nm, 45nm, 32nm, 22nm • Show output delay for NBTI for each technology scale after 7 yrs 65 nm (25%), 45 nm(27%), 32 nm (31%), 22 nm (46%) • Require design of NBTI-aware ALU 55%

  10. NBTI-aware ALU Design We : • Determine that SPEC2000 INT benchmarks have 50 % operands of 16-bit size • Partition 64-bit ALU into four 8-bit and two 16-bit independent blocks to support 8,16,32 and 64bit operation • Aim is to use utilize idle time and narrow-width operands to increase recovery time of PMOS devices • Use Power gating technique • Use round-robin mechanism to let all the blocks of ALU experience equal recovery time • After 7 yrs the delay is only 10% - Achieves 60% improvement over non-NBTI aware ALU • Tradeoff!! 60%

  11. TDDB – A quick guide • Gate dielectric wears down over time due to electric field and failure occurs when there is a short through the gate oxide • Ultra-thin gate oxide breakdown is highly dependent on temperature, but also dependent on Vgs 70%

  12. Circuit-level Reliability Simulation We : • Use Pin to get a set of inputs used when running gzip and use those inputs to find an input pattern based on the samples taken from Pin • Use Cadence Spectresimulator • Use Kogge-Stone adder circuit for ALU • Use average temperature of 1st ALU from architectural-level reliability simulation and feed to Cadence framework • Extract Vgs from every device in Kogge-Stone adder 80%

  13. Comparison of Approaches 85%

  14. Scaling Effect We : • Measured Vgs, but temperature needs to be investigated. 95%

  15. Conclusion • For some problems like TDDB, the Architecture / Circuit level simulation gap is almost nonexistent • For other problems like NBTI, the Architecture / Circuit level simulation gap is significant and combining both approaches can yield better designs 100%

  16. Thank you Questions ?

More Related