Understanding Fault Implications in Prediction Arrays for Reliable Performance

Performance Implications of Faults in Prediction Arrays Nikolas Ladas Yiannakis Sazeides Veerle Desmet University of Cyprus Ghent University DFR’ 10 Pisa, Italy - 24/1/2010 HiPEAC2010

Motivation Technologyscaling: Opportunities and Challenges Reliability and computing tomorrow Failures will not be exceptional Various sources of failures Manufacturing: imperfections, process-variation Physical phenomena: soft-errors, wear-out Power constraints: control operation below Vcc-min Key challenge: provide reliable operationwith little or no performance degradation in the presence of faults with low-overhead solutions 2 Nikolas Ladas 24/1/2010

Architectural vs Non-Architectural Faults So far research mainly focused on correctness Emphasis architectural structures, e.g. caches, registers, buses, alus etc However, faults can occur in non-architectural structures, e.g. predictor and replacement arrays Faults in non-architectural structures may degrade performance Not issue for soft-errors Can be problem for persistent faults: wear-out, process-variation, operation below Vcc-min 3 Nikolas Ladas 24/1/2010

Non-architectural Resources Arrays line predictor branch direction predictor return-address-stack indirect jump predictor memory dependence prediction way, hit/miss, bank predictors replacement arrays (various caches) hysteresis arrays (various predictors) ... Non-Arrays branch target address adder memory prefetch adder .... EV6 like core array bits breakdown Nikolas Ladas 24/1/2010

This talk… Quantify performance implications of faults in non-architectural array-structures Identify which non-architectural array-structures are the most sensitive to faults Do we need to worry about protecting these structures? Nikolas Ladas 24/1/2010

Outline Fault model / Experimental framework Performance implications of faults when all non-architectural arrays are faulty Criticality of the non-architectural arrays studied Fault semantics Conclusions and future direction Nikolas Ladas 24/1/2010

Faults and Arrays Faults may occur in different parts of an array We only consider cell faults WL WL BL BL’ BL BL’ BL BL’ BL BL’ . . . BL BL’ BL BL’ cell cell cell cell WL cell cell cell cell cell cell cell cell cell cell WL cell cell cell cell cell cell WL cell cell cell cell cell cell wordline wordline cell cell WL cell cell cell cell cell cell WL cell cell cell cell cell cell cell WL cell cell . . decoder . bitline WL cell cell cell cell cell cell WL driver Nikolas Ladas 24/1/2010

Array Fault Modeling Key Parameters Number of faults: consider % of cells that are faulty: 0.125 and 0.5 Understand performance trends with increasing number of faults Fault Locations consider random fault locations each affecting 1 cell Try to capture average behavior Model for each fault each faulty cell randomly set at either stuck-at-1 or stuck-at-0 Nikolas Ladas 24/1/2010

Processor Model EV7 like processor with 15 stage pipeline 4-way ooo, mispredictions resolved at commit Non-Architectural Arrays Considered Line Predictor Array: 4K entries, 11 bits/entry Line Predictor Hysteresis Array: 4K entries, 2 bits/entry LRU array for 2-way 64KB 64B/block I$ : 512 entries, 1 bit/entry LRU array 2-way 64KB 64B/block D$ : 512 entries, 1 bit/entry Gshare Direction Predictor: 32Kentries, 2bits/entry Return address stack: 16 entries, 31bits/entry Memory dependence predictor (load-wait) 1024 entries, 1 bit/entry sim-alpha simulator SPEC CPU 2000 benchmarks – 100 M instructions Representative regions Nikolas Ladas 24/1/2010

Experiments Baseline performance: runs with no faults For experiments with faults: For each run all arrays with faults have same % of faulty bits 0.125, 0.5 ALL experiments are performed using the same 100 randomly generated fault maps (50 for each % of faulty bits) 0.125% 0.5% Gshare Direction Predictor 65536 bits: 82 328 Line Predictor Array 45056 bits: 56 225 Line Predictor Hysteresis Array 8192 bits: 10 41 Memory dependence predictor 1024 bits: 1 5 2-way 64KB 64B/block I$ LRU array 512 bits: 1 3 2-way 64KB 64B/block D$ LRU array 512 bits: 1 3 Return address stack 496 bits: 1 3 Nikolas Ladas 24/1/2010

Performance with 0.125% Faulty Bits (all arrays faulty) Nikolas Ladas 24/1/2010

Performance with 0.5% of Faulty Bits (all arrays faulty) Nikolas Ladas 24/1/2010

Observations with all arrays faulty Performance degradation substantial even with small % of faulty bits Both INT and FP benchmarks can degrade 0.125 0.5 Average degradation 1% 3.5% Max degradation 39% 53% Degradation is benchmark specific Instruction mix (different number and type of vulnerable instructions) Programs with high accuracy more vulnerable than those with low accuracies When few arrays entries accessed by a program it takes large number of faults to have faulty entries accessed Some benchmarks are memory dominated Worst-case degradation much greater than average Will cause performance variation between otherwise identical cores/chips Are all bits equally vulnerable? Which unit(s) matter the most? Nikolas Ladas 24/1/2010

Performance for Each Structure(0.125% faulty bits) 26 benchmarks x 50 experiments for each section Nikolas Ladas 24/1/2010

Performance for Each Structure(0.5% faulty bits) 26 benchmarks x 50 experiments for each section 15 Nikolas Ladas 24/1/2010

Observations For the processor configuration used in this study the various non-architectural units are not equally vulnerable to same fraction of faults. RAS and BPRED are the most sensitive to faults Line predictor and load-wait predictor degrade performance significantly when there are 0.5% faults 2-way I$ and D$ are not sensitive even at 0.5% of faults in the LRU array Nikolas Ladas 24/1/2010

Reasons for Variable Vulnerability across units Semantics of faults vary across unit Some faults cause flushing the pipeline, others delay the execution of an instruction, others cause a one-cycle bubble Faults causing delays can be less severe since they can be hidden in the shadow of a misprediction or with ooo Units with typically higher accuracy more vulnerable (RAS and conditional predictor) Even within a unit faults can have different semantics Nikolas Ladas 24/1/2010

Semantics of Faults for a 2-bit Replacement State Action 0x Replace 1x No replace 0/1 Stack-at value 00 R 00 R 00 R 01 R 01 R 01 R 01 R 00 R 10 N 11 N 11 N 11 N 11 N 10 N 10 N 10 N Always Replace Never Replace Nikolas Ladas 24/1/2010

Repair mechanism: XOR Remapping After remapping Fault map Access map XOR 1 Faulty accesses: 14370 • Access map: counts access/entry during an interval • Fault Map: indicates which entries are faulty (can be determined at manufacturing test or at very coarse intervals using BIST) • Remap the index using XOR to minimize faulty accesses • At regular intervals search for the optimal XOR value using the access map and fault map Nikolas Ladas 24/1/2010

Results • 26 benchmarks x 10 fault maps per category • Recovers most of the performance degradation • Possible to make things worse if we remap when there is no need 20 Nikolas Ladas 24/1/2010

Summary-Conclusions Faults in non-architectural arrays can degrade processor performance Not all faults are equally important. Fault semantics vary. RAS and conditional branch predictor the most critical Faults can cause performance non-determinism across otherwise identical chips or within the cores of the same chip 21 Nikolas Ladas 24/1/2010

Future Work Develop analytical model to predict the performance distribution for a given failure rate Understand implications of faults for other architectural and non-architectural structures Nikolas Ladas 24/1/2010

Acknowledgments Costas Kourougiannis Funding: University of Cyprus, Ghent University, HiPEAC, Intel Nikolas Ladas 24/1/2010

Thanks!

BACKUP SLIDES

Fault Semantics Line Predictor Array: incorrect prediction Conditional, returns get corrected within a cycle, indirects are resolved much later Line Predictor Hysteresis Array: Always update prediction on a misprediction Never update 2-way 64KB 64B/block I$ and D$ LRU arrays Converts sets with faulty LRU bit to direct mapped sets, more misses but can hide Gshare Direction Predictor faulty entries always predict taken or always not-taken Incorrect prediction that gets resolved late (25% chance been lucky) Return address stack Return misprediction is resolved late Memory dependence predictor (load-wait) Independent load wait (common case we should not wait) can partially hide Dependent load not wait (this should rarely be a serious problem) 26 Nikolas Ladas 24/1/2010

Processor Pipeline 27

Line predictor Logical structure 28

Functional Faults and Array Logical View Not practical to study faults at physical level Functional Models: Abstractions that ease study of faults Fault locations: cell, input address, input/output data We only consider cell faults

BIST for Detecting Faults and Updating Fault Map

Example Remapping Search Algo

Interleaved vs Non-Interleaved Design Style (1) Each array wordline contains many entries Entries in the physical implementation are bit-interleaved More area efficient

Interleaved vs Non-Interleaved Design Style (2) But a cluster faults affects more entries in interleaved design For architectural structures: Soft-errors prefer interleaved Hard-errors: map to spare/disable block/set For non-architectural structures: Soft-errors – no need for protection Hard-errors: prefer non-interleaved (if area not issue)

4K LP:No Interleaving vs Interleaving (average random)

Random results without and with remapping

Expected Invariants With increasing faults more performance degradation Frequently accessed entries more critical than less accessed entries Cell stuck-at-1 more critical if bits stored in the cell are biased towards zero

Worst-case - Hit rate

Random results without and with remapping

Understanding Fault Implications in Prediction Arrays for Reliable Performance

Understanding Fault Implications in Prediction Arrays for Reliable Performance

Presentation Transcript

Simple Performance Prediction Methods

Simple Performance Prediction Methods

GPU Performance Prediction

OFF DESIGN PERFORMANCE PREDICTION OF STEAM TURBINES

Faults

Faults

FAULTS

Types of Faults

Types of Faults

Performance Bounds in OFDM Channel Prediction

Beyond Network Faults and Performance Management

Prediction of bottom water drive reservoir performance

Performance Implications of Link Characteristics (pilc)

Arrays of Arrays (Multidimensional Arrays)

Performance Prediction Engineering

Kinds of Faults

Arrays of Arrays

Towards prediction of algorithm performance in real world problems

Image Based Prediction of Thermal Imaging Performance

Prediction of Solution Gas drive reservoir performance

How To Locate Faults In Cables? | Types of Cable Faults

Enabling Prediction of Performance