1 / 62

Increasing Reliability of Performance-critical Pipeline structures

Increasing Reliability of Performance-critical Pipeline structures. Niranjan Soundararajan Advisors: Vijaykrishnan Narayanan Anand Sivasubramaniam Computer Systems Lab (CSL) Microsystems Design Lab (MDL) Computer Science and Engineering The Pennsylvania State University. 1.

aira
Download Presentation

Increasing Reliability of Performance-critical Pipeline structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Increasing Reliability of Performance-critical Pipeline structures Niranjan Soundararajan Advisors: Vijaykrishnan Narayanan Anand Sivasubramaniam Computer Systems Lab (CSL) Microsystems Design Lab (MDL) Computer Science and Engineering The Pennsylvania State University 1

  2. Reliability – Increasing Importance Decreasing transistor size More transistors Power/Temperature Hotspots Increasing Market Segments HARDWARE RELIABILITY 2

  3. Performance critical pipeline structures Out-of-order entry activity Back-to-Back wakeup Multi-width pipeline Clock frequency increase FRONT END BACK END BHT BTB Load/Store Queue Dcache Inst Issue Queue Fetch Decode ALU RAT Inst Retires Alloc Icache Reorder Buffer ARF

  4. Transistor Failure Solutions to address impact of Process Variations on Issue Queue Solutions to reduce non-uniform aging due to NBTI, HCE on microprocessor structures Manufacturing Defects Wearout Failure Rate Soft Error impact of DVFS on vulnerability of GALS architectures Bounding vulnerability of processor structures to provide reliability guarantees Random Errors Time

  5. Outline • Motivation • Contributions • Vulnerability bounding mechanisms • Other solutions • Impact of DVFS on architectural vulnerability of GALS architectures • Address process variations in issue queue • Mitigate NBTI, HCE degradation in structures • Conclusion and Future work 5

  6. Strike creates electron-hole pairs that can be absorbed by source/diffusion areas of the transistor to change state of device N Error 1 0 p n+ n+ - - + + - - + + Introduction to Soft Errors Source: M. Tahoori

  7. Severity In 2003, Fujitsu released SPARC64 with 80% of 200,000 latches covered by transient fault protection Single Event Upset (SEU) model Metrics MTBF : Mean Time Between Failures FIT : Failure in Time = 1 failure in a billion hours. FITeff = FITraw * AVF Impact of Soft Errors Severity of Soft Error Rates Source: Shekar Borkar, Intel 2004 7

  8. Architectural Vulnerability Factor (AVF) LD A Architecturally Correct Execution (ACE) Instruction Wrong Path Dead Store BR ST B ADD unACE Instruction AVF - Fraction of bits in a structure vulnerable to soft errors - ACE bits / (ACE bits + UnACE bits) - Fn (Size, Time) ST B User Visible Output

  9. AVF: Why is it important to Micro-architects? System Specification Architectural Design Logic Synthesis Circuit Design AVF per structure AVF FITraw System Reliability = ∑ (FITraw * AVF) Fabrication and Packaging Physical Design

  10. State-of-Art • Microprocessor design: Multi-dimensional problem involving Performance, Power and Reliability • Transient Fault Tolerance • Simultaneous Redundant Threading (SRT) • Lockstepping • Optimization techniques • Parashar et al., ISCA’04 • Gomaa et al., ISCA’05 • Parashar et al., ASPLOS’06 • Reddy et al., ASPLOS’06 Performance Overhead Single point in Performance-Reliability space

  11. Micro-architectural Reliability Knob More Reliable Less Performance FITeff = FITraw * AVF FITraw and AVF being constants Reliability Ideal Solution FITraw inflexible Tune AVF to meet specifications FITrequired Less Reliable More Performance Performance “Challenge for computer architects is not to provide absolute guarantees in reliability, but rather how to provide the adequate amount of reliability at the lowest cost for the target market segment” Architecture Design for Soft Errors – Shubu Mukherjee, Intel 11

  12. Contributions • First work that provides micro-architectural knobs to satisfy processor reliability budgets for transient faults • Proactive and Reactive mechanisms to monitor and bound vulnerabilities of processor structures at cycle-level granularity

  13. AVF Monitoring Reorder Buffer/Physical Register File RAT ARF Issue Queue ALU Fetch Decode Reorder Buffer (ROB) Commit Reorder Buffer (PRF) 1. Large pipeline structure holding number of instructions 2. Each instruction spends significant percentage of lifetime in ROB Pipeline out-of-order Pipeline In-order Pipeline In-order

  14. AVF Monitoring MechanismReorder Buffer (ROB) R Commit Event Filled at WB Filled at Dispatch B Mis-speculation Reorder Buffer N N entries Each entry B bits Result R bits Writeback Event Dispatch Event 14

  15. Vulnerability Control via Throttling (VCT) STALL DISPATCH AND WRITEBACK DISPATCH WRITEBACK Writeback cannot be stalled Entire Entry ACE at Dispatch Size = Fn (AVF Bound) N - Entry REORDERBUFFER 15

  16. VCT Performance VCT High Integrity Low Integrity

  17. Advantages of a Reactive Bounding Mechanism Reorder Buffer AVF Bound Exceeded Verify Results Early Accounting of Writebacks Mis-speculated Instructions

  18. Simultaneous Redundant Threading (SRT): Importance of Selective Redundancy ARF RAT ISQ ALU Fetch Decode Reorder Buffer (PRF) ARF RAT Redundant Thread After Primary Thread Result Verification Reduces AVF Redundant Execution protects entire pipeline AVF goes down

  19. Vulnerability Control via Selective Redundancy (VCSR) Infrastructure ARF RAT ISQ ALU Fetch Decode Reorder Buffer (ROB) RAT ARF AVF Bound Exceeded Greedy Heuristic Result Buffer 19

  20. VCSR Performance VCSR SRT VCT High Integrity Low Integrity

  21. OptimizationsPrimary Thread Out Of Order Commit Non-compacting Reorder Buffer Reduces AVF Performance Boost since lesser inst are re-executed RAT ARF ISQ ALU Fetch Decode Reorder Buffer (PRF) ARF RAT Writeback – Commit ROB AVF affected Sec. Thread maintains architected state Result Buffer 21

  22. VCH with OOO Commit Performance VCH(OOO) SRT VCSR VCT High Integrity Low Integrity

  23. Impact of vulnerability bounding • Per-cycle vulnerability bounds, guaranteeing FIT rates are met • Future Work • Looking at developing a system-level AVF monitoring and bounding infrastructure

  24. Outline • Motivation • Contributions • Vulnerability bounding mechanisms • Summary of other works • Impact of DVFS on architectural vulnerability of GALS architectures • Address process variations in issue queue • Mitigate NBTI, HCE degradation in structures • Conclusion and Future work 24

  25. Need for vulnerability analysis in GALS Architectures • Multiple domains, each driven by individual clocks • Need for global clock network avoided • GALS enables fine-grained VF scaling tuned to individual domains • DVFS provides high performance per watt • DVFS algorithms for GALS architectures are studied w.r.t IPC per watt • Voltage scalingaffects FITraw, Frequency scalingaffects AVF • Impact on AVF due to applying different DVFS algorithms • Help designers choose DVFS algorithms meeting reliability requirements Reliability Impact ignored

  26. AVF impact across algorithms Significant AVF variations when applying different algorithms Most DVFS algorithms lead to worser AVF than Non-DVFS 38% variation Lower is better 26

  27. Outline • Motivation • Contributions • Vulnerability bounding mechanisms • Other solutions • Impact of DVFS on architectural vulnerability of GALS architectures • Address process variations in issue queue • Mitigate NBTI, HCE degradation in structures • Conclusion and Future work 27

  28. Process Variation (PV) - Introduction Process Variation: Variation in characteristics between two identically designed circuits Process Variation • Performance and Power impact significant • Lack of predictability in timing characteristics lead • to loss of yield Dynamic Static • Aging • Thermal Effects Definite need to address PV at circuit and microarchitectural level Systematic Random • Sub-wavelength • Lithography • Overlay • Dose • RDF [J. Tschanz et al., DAC 2005] 28

  29. Contributions • Study the impact of PV on the Issue Queueof a microprocessor • PV-unaware design has about 21% performance degradation w.r.t Non-PV design • PV is a non-deterministic phenomenon. Design-time static partitioning not possible. Our solution enables the fast and slow entries to co-exist • Instruction steering and sub-component switchingschemes to reduce the impact of PV • Performance loss is about 1.3% w.r.t Non-PV design

  30. Issue Queue Entry Issue Read Tag1 Tag N Forwarding Comparison Forwarding Write Opcode R Tag Operand R Tag Operand Dest Tag V Select Logic Dispatch Write INSTRUCTION ISSUE SELECT INST. READY Valid Bit Reset t t+1 t+2 t+3 Alloc stalls Dispatch ALLOC LOGIC Time DISPATCH WRITE Valid Bit Set ISQ Full FORWARDING Operand Ready Bit Set Instruction wait for Ready Operands

  31. Results Stalls reduced w.r.t specific activity Operand and port-switching further reduce stalls to a minimum 1.3% 7.3% 12% Non-PV Shutdown MCD PV-Aware

  32. Outline • Motivation • Contributions • Vulnerability bounding mechanisms • Other solutions • Impact of DVFS on architectural vulnerability of GALS architectures • Address process variations in issue queue • Mitigate NBTI, HCE degradation in structures • Conclusion and Future work 32

  33. Increasing impact of transistor wearout • Transistor lifetime decreasing with newer technologies • Conservative Guardbands impact performance • System longevity affects revenue More than 50% organizations, machine-age > 10 years Decreasing Technology Source: Intel Poll by Gartner Research, Source: J. Blome, Micro 2007

  34. Contributions • NBTI, HCE impact increasing in upcoming technologies • Conventional collapsing issue queues have unwanted instruction movement across entries • Collapsing required for age-based selection • Round-Robin scheme to provide restricted collapsing • Restricted collapsing balances switching activity,not losing much of age-based selection

  35. Implementation Capture Rd / Wr / Sw / Data probabilities per cell HSpice (32nm, 380K) 10-year degradation SPEC2K Benchmark 100M instructions Simplescalar Architectural simulator [ISQ] Transistor-level Degradation model Typically, solutions look at worst-case probabilities that might rarely occur Read Delay Degradation

  36. Results 1% reduction 32% reduction

  37. Conclusion • Growing Reliability concern “Pop culture of reliability has arrived” - Dr. Phil Emma, IBM [Architecture Design for Soft Errors] • Work looks at increasing the fault-tolerance in back-end • Soft errors • Process variation • Wearout 37

  38. Current Work • Multi-core design have come to prominence • While cache have ECC, the multiple pipelines involve structures holding data – ECC is hard • Total vulnerability to soft errors increases • Study the impact on AVF of different structures in a multi-core environment 38

  39. Future Work • Multi-core • Cores increase, market segments increase • ILP vs TLP vs Clock frequency increase • Application/Hardware sense best configuration • Reconfigurable Hardware • Defect Tolerance • Verification time increasing • “Firmware update” to control functionality

  40. Thank you 40

  41. Backup slides

  42. DVFS Algorithms µk = µk-1 + KI (q’k – qref) + Kp (q’k – q’k-1) fk = µk / IPC • Threshold • VF scale use fixed thresholds. Preset thresholds affects algorithm efficiency • Attack-Decay(AD) • Based on util. in adjacent intervals. Attack whenever big util. change. Otherwise decay. Greedy nature affects efficiency • Modified Attack-Decay (ModAD) • Attack phase modified to correspond to util. change. Large VF swing can affect performance per watt • PI • Greedy • Sample and Hold phase. VF scaling based on ED2 of past 2 intervals 42

  43. Vulnerability Efficiency Non-DVFS has the best vulnerability efficiency On average, AD and PI provide the best vulnerability efficiency Lower is better 40% variation 43

  44. Round Robin scheme Head Clk Ctrl Bit PseudoHead (PH) New Inst Clk Clk Ctrl Bit N Ctrl Bit 0 Tail 1 1 1 0 0 44 PH Later Entries Collapse Control Vector

  45. Reliability Issues of Importance • Solutions that are robust but overhead-aware as well 45

  46. Contributions Hardware Failure • Bounding vulnerability of • processor structures to • provide reliability guarantees • Study impact of DVFS on vulnerability of GALS architectures Solutions to reduce non-uniform aging due to NBTI, HCE on microprocessor structures Permanent Temporary Solutions to address impact of process variations on issue queue Wearout Transient Intermittent Process variation Radiation Non-Radiation Soft Errors Power supply Source: ISCA 2005 tutorial 46

  47. Results SR with T(OOO) SRT SR Throttling (T) High Integrity Low Integrity 47

  48. PV-aware steering - OptiSteer Non-Collapsing Op STag1 STag2 DTag Issue Queue Dest Tag Dest Tag RAT ISQ Entry id STALL - - - Slow Entry Bit Alloc Decoder Demux - - - Assigns ISQ Entry Stall Optimization Table Source Tags (STag1, STag2) 48

  49. Intra-Entry Variation schemes Operand- and Port-Switching Issue Read Op STag1 Operand1 STag2 DTag V Opcode R Tag Operand R Tag Operand Dest Tag Op STag2 STag1 Operand1 DTag Operand Switch Dispatch Write Dispatch Port Switch Op STag1 Operand1 STag2 DTag 49

  50. Timeline of ISQ activities SELECT INST. READY Port Switch Slow issue read SELECT INST. READY Less instructions selected INSTRUCTION ISSUE Valid Bit Reset t t+1 t+2 t+3 ALLOC LOGIC Alloc stalls Dispatch Time DISPATCH WRITE Valid Bit Set Operand Switch Port Switch FORWARDING Operand Ready Bit Set ISQ Full SOT Fill Slow Dispatch Write Instruction wait for Ready Operands SOT Value Required Forwarding Stall 50

More Related