1 / 21

StageNet: A Reconfigurable CMP Fabric for Resilient Systems

StageNet: A Reconfigurable CMP Fabric for Resilient Systems. Shantanu Gupta Shuguang Feng Jason Blome Scott Mahlke. 2 nd Workshop on Reconfigurable and Adaptable Architecture Dec 1, 2007. Reliability Challenge . Increasing defect rates is a major challenge [ITRS’03]

norina
Download Presentation

StageNet: A Reconfigurable CMP Fabric for Resilient Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StageNet: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang Feng Jason Blome Scott Mahlke 2nd Workshop on Reconfigurable and Adaptable Architecture Dec 1, 2007 1

  2. Reliability Challenge • Increasing defect rates is a major challenge [ITRS’03] • ↑ power density ↓ feature sizes↑ failures in time (FIT) • Permanent faults • Manufacturing defects • Time dependent dioxide breakdown (TDDB) • Negative bias threshold inversion (NBTI) • Electromigration (EM) • …. [Srinivasan, DSN‘04] For 32nm technology node, an 8 core CMP would face ~30 faults in 4 years 2

  3. Tolerating Permanent Faults • Traditional solutions • TMR • Tandem / HP Non-stop • Impractical for mainstream • Cost • Power • Low gain • Current approaches • Detection/Prediction • Using sensors • Analytical models • Redundant execution • BIST • Repair • Replacement • Reconfiguration K-pos DP-31/32 Teramac (1995) 3

  4. Reconfiguration Granularity • Range of choices for the reconfiguration granularity CORE level STAGE level MODULE level FETCH DEC EXEC MEM WB - ElastIC, DT’ 06 - Reunion, MICRO’06 - Configurable Isolation, ISCA’07 • Online Diagnosis of Hard Faults, MICRO’ 05 • - Ultra Low-Cost Defect Protection, ASPLOS’ 06 Better resource utilization Lower design complexity Lower overheads 4

  5. Mean Time to Failure Comparison CORE level + Easiest to do in practice -- Poorest MTTF gains STAGE level + Circuit/logical boundary + Improved MTTF gains -- Architectural complexity MODULE level + Best MTTF gains -- Hardest to repair MODULE level STAGE level CORE level MTTF increase (%) Area increase (%) 5

  6. Throughput Comparison • Monte-Carlo study • Randomly injected failures • Assumes that stages are shared resources STAGE level CORE level STAGE level reconfiguration allow significantly more graceful throughput degradation 6

  7. Goal of this Research • Design a computing substrate • Fault tolerant • Graceful performance degradation with defects • Highly reconfigurable • Adaptable to the workload Design that can meet the challenge of facing ~ 100s of faults while maintaining 70-80% throughput 7

  8. CMP Fabric Stage1 Stage2 Stage1 Stage2 Stage3 Stage3 StageN StageN Core 1 Core 0 Stage1 Stage2 Stage1 Stage2 Stage3 Stage3 StageN StageN Core 2 Core 3 8

  9. Logical pipeline Allocator StageNet CMP Fabric Stage1 Stage2 Stage3 StageN Stage1 Stage2 Stage3 StageN Stage1 Stage2 Stage3 StageN Stage1 Stage2 Stage3 StageN Configuration Manager 9

  10. StageNet CMP Fabric - Benefits Stage1 Stage2 Stage3 StageN Stage1 Stage2 Stage3 StageN Stage1 Stage2 Stage3 StageN Stage1 Stage2 Stage3 StageN Configuration Manager 10

  11. Allocator StageNet CMP Fabric - Issues • Performance / Efficiency • Scaling with number of stages • Impact of router delay • Transmission delay (tdelay) • Congestion delay • Design overheads • Area • Power • Micro-architectural concerns • Data forwarding logic • Control flow handling 64 256 bits 11

  12. Experimental Setup Simulates an in-order core with default parameters Stores statistics for the benchmarks Parameterizable performance model for StageNet SimpleScalar 4.0 - No. of instructions - No. of cycles - Branch mis-predicts - I/D cache misses …. StageNet Model MiBench suite CPI Results 12

  13. Effect of varying pipeline depth tdelay 1 13

  14. Effect of varying transmission delay stages 10 14

  15. >> LD LD + / & + >> << ST ST Performance enhancement • Router delay is the leading cause for the slowdown • Need some way to improve system utilization • Let us send macro-ops (MOP) • MOP is an instruction bundle • Upper bound on length • Upper bound on live-ins / live-outs • No branches in between • Advantages • Amortizes delay / contention • Increases resource utilization Max length 4 Max live-ins 2 15

  16. Effect of varying MOP size tdelay 4 stages 10 16

  17. Conclusions • Reliability aware architectures with a finer grained reconfiguration are desirable for: • Better MTTF gains • Graceful throughput degradation • StageNet, a potential solution, allows stage level reconfiguration and is: • Easy to reconfigure • Inherently redundant • Potentially scalable issue width • Using StageNet, significant reconfiguration flexibility can be traded with a small loss in performance 17

  18. Future Work • Micro-architectural issues • Data bypass handling • Control flow handling • Sharing state between pipeline stages • Network design • Design of routers • Design of interconnection • Simulation setup • Validation of results using a cycle accurate simulator 18

  19. StageNet: A Reconfigurable CMP Fabric for Resilient Systems 19

  20. Back up slides 20

  21. DECODER DECODER IF/ID ID/EX DECODER BIST BIST CHECKER CHECKER Test Vectors Test Vectors (majority) (majority) Ultra Low-Cost Defect Protection for Microprocessor Pipelines, ASPLOS’ 06 Repair ElastIC DT’06 F. Bower, Tolerating Hard Faults in Microprocessor Array Structures, DSN’ 04 H.Qin, UC Berkeley 21

More Related