180 likes | 266 Views
Cost-Effective Register File Soft Error reduction. Pablo Montesinos, Wei Liu and Josep Torellas, University of Illinois at Urbana-Champaign. Overview. Study of register file vulnerability to SDC(Silent Data Corruption) Shield – cost effective protection to register files
E N D
Cost-Effective Register File Soft Error reduction Pablo Montesinos, Wei Liu and Josep Torellas, University of Illinois at Urbana-Champaign
Overview • Study of register file vulnerability to SDC(Silent Data Corruption) • Shield – cost effective protection to register files • Highighting policies and techniques used in shield • Experiment - Results
Register File AVF • RF-AVF is the probability that a fault that occurs will lead to error. • Register lifetime is divided into PreWrite, Useful, and PostLastRead parts. • Based on AVF calculation we can divide lifetime of bit into ACE (Architecturally Correct Execution) and un-ACE cycles.
Register File AVF • During PreWrite Period – un-ACE • If used atleast once after write the reg switches to ACE state. • After last read on reg, switches back to un-ACE during PostLastRead
Highlighting Insights (1) • The combined %-USEFUL time of all registers is small
Highlighting Insights (1) • The average number of useful (live) registers is less than 20 (SPECint) and 17(SPECfp). • It is thus possible to redue the vulnerability of the register file by only protecting a subset of carefully chosen registers at a time.
Highlighting Insights (2) • Only a few long-lived registers contribute to overall Total useful time • On average less than 10% of register versions are long-lived.
Highlighting Insights (2) • On average 40% of useful time comes from the few long-lived versions. • In SPECfp, 5% of long-lived versions account for 46% of the useful time.
Motivation • Register files have a very high access rate. • High temperature thus leading to lesser Qcrit for the devices. • An error in an RF can propagate with hght failure probability • If we isolate a few register versions, predicting their life-time, and protect these register versions alone, high reliability can be achieved with limited overhead.
Shield - Architecture Life-Time Prediction Register Error Check Shielding Decision Error Recovery
Reg-Version Lifetime Prediction P12 => Used(1) , Renamed(1) P7 => Used(0) , Renamed(1)
Shielding Decision • These prediction bits are stored as status in the ECC table. • The decision to shield an incoming register version written is by: • Availability of free ECC-Table entry • Same register# present in the ECC table will be replaced with new entry. • Existing reg-version with lesser lifetime than incoming reg-version will be replaced. • Replacement policy:
Register Error Check & Recovery • On a read request the register data is sent to the original datapath and shield. • If the Reg# matches with a tag entry, then the reg-data is checked for errors at the ECC-Checker. • If Error is detected • Processor stalls the instruction I reading reg P • Reg-data is corrected and written into RF • Oldest read instruction reading reg P in ROB and all succeeding instructions is flushed. • Processor resumes from flushed instruction.
Experiments- Results • AVF computation for RF with shield
Experiments-Results • AVF of intREG reduced by different replacement policies: • LRU = 31% • Effective = 63% • OptEffective = 84% ( pinning of global pointers to particular ECC entries + Effective ) • AVF for fpREG can be reduced maximum by 100%, because fewer fp-registers are in useful state.
Power and Area Impact • Shield only uses 3ECC generators and 3 ECC checkers. • Shield has 45% power overhead over a plain register file. (Full ECC has 2X) • Shield introduces an overall 10% area overhead.
Conclusion • A cost-effective architectural technique has been proposed to reduce the vulnerability of RF by 84% • The area and power overhead indicated is a marginal tradeoff for reliability achieved.