1 / 18

Cost-Effective Register File Soft Error reduction

Cost-Effective Register File Soft Error reduction. Pablo Montesinos, Wei Liu and Josep Torellas, University of Illinois at Urbana-Champaign. Overview. Study of register file vulnerability to SDC(Silent Data Corruption) Shield – cost effective protection to register files

selia
Download Presentation

Cost-Effective Register File Soft Error reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cost-Effective Register File Soft Error reduction Pablo Montesinos, Wei Liu and Josep Torellas, University of Illinois at Urbana-Champaign

  2. Overview • Study of register file vulnerability to SDC(Silent Data Corruption) • Shield – cost effective protection to register files • Highighting policies and techniques used in shield • Experiment - Results

  3. Register File AVF • RF-AVF is the probability that a fault that occurs will lead to error. • Register lifetime is divided into PreWrite, Useful, and PostLastRead parts. • Based on AVF calculation we can divide lifetime of bit into ACE (Architecturally Correct Execution) and un-ACE cycles.

  4. Register File AVF • During PreWrite Period – un-ACE • If used atleast once after write the reg switches to ACE state. • After last read on reg, switches back to un-ACE during PostLastRead

  5. Highlighting Insights (1) • The combined %-USEFUL time of all registers is small

  6. Highlighting Insights (1) • The average number of useful (live) registers is less than 20 (SPECint) and 17(SPECfp). • It is thus possible to redue the vulnerability of the register file by only protecting a subset of carefully chosen registers at a time.

  7. Highlighting Insights (2) • Only a few long-lived registers contribute to overall Total useful time • On average less than 10% of register versions are long-lived.

  8. Highlighting Insights (2) • On average 40% of useful time comes from the few long-lived versions. • In SPECfp, 5% of long-lived versions account for 46% of the useful time.

  9. Motivation • Register files have a very high access rate. • High temperature thus leading to lesser Qcrit for the devices. • An error in an RF can propagate with hght failure probability • If we isolate a few register versions, predicting their life-time, and protect these register versions alone, high reliability can be achieved with limited overhead.

  10. Shield - Architecture Life-Time Prediction Register Error Check Shielding Decision Error Recovery

  11. Reg-Version Lifetime Prediction P12 => Used(1) , Renamed(1) P7 => Used(0) , Renamed(1)

  12. Shielding Decision • These prediction bits are stored as status in the ECC table. • The decision to shield an incoming register version written is by: • Availability of free ECC-Table entry • Same register# present in the ECC table will be replaced with new entry. • Existing reg-version with lesser lifetime than incoming reg-version will be replaced. • Replacement policy:

  13. Register Error Check & Recovery • On a read request the register data is sent to the original datapath and shield. • If the Reg# matches with a tag entry, then the reg-data is checked for errors at the ECC-Checker. • If Error is detected • Processor stalls the instruction I reading reg P • Reg-data is corrected and written into RF • Oldest read instruction reading reg P in ROB and all succeeding instructions is flushed. • Processor resumes from flushed instruction.

  14. Experiments- Results • AVF computation for RF with shield

  15. Experiments-Results • AVF of intREG reduced by different replacement policies: • LRU = 31% • Effective = 63% • OptEffective = 84% ( pinning of global pointers to particular ECC entries + Effective ) • AVF for fpREG can be reduced maximum by 100%, because fewer fp-registers are in useful state.

  16. Power and Area Impact • Shield only uses 3ECC generators and 3 ECC checkers. • Shield has 45% power overhead over a plain register file. (Full ECC has 2X) • Shield introduces an overall 10% area overhead.

  17. Conclusion • A cost-effective architectural technique has been proposed to reduce the vulnerability of RF by 84% • The area and power overhead indicated is a marginal tradeoff for reliability achieved.

More Related