drain distributed recovery architecture for inaccessible nodes in multi core chips n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips PowerPoint Presentation
Download Presentation
DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips

Loading in 2 Seconds...

  share
play fullscreen
1 / 16
molly

DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips - PowerPoint PPT Presentation

74 Views
Download Presentation
DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips Andrew DeOrio†, KonstantinosAisopos‡§Valeria Bertacco†, Li-ShiuanPeh§ †University of Michigan‡Princeton University§Massachusetts Institute of Technology DAC 2011

  2. Reliable Networks on Chip fault-tolerant routing Drain up $ recovery detection diagnosis recon-figuratio R Diagnose what fault has occurred Detect if fault has occurred Reconfigure network to account for fault Recover and resume normal operation processor nodes are disconnected, state is lost! cache router

  3. Recovery Approaches • Checkpoint/recovery approaches • Drain takes a reactive approach, incurring performance overhead only when errors occur uP uP $ $ checkpoint buffers R R MEM high performance overhead! data stuck in checkpoint buffer!

  4. Data Recovery with Drain • Recover data lost during reconfiguration • Emergency links provide alternate path • Transfers cache contents and architectural state processor u P u P u P . . . . core . . . . . . . . . . . . . . local cache ... ... $ $ $ DRAIN emergency link ... ... . . . . . . . . . Router Router Router primary link . . . memory . . . Mem . . . controller

  5. Drain Example up up $ $ up up $ $ M

  6. Drain Example up up $ $ up up link failure X $ $ M

  7. Drain Example up up $ $ reconfigure interconnect up up $ $ M

  8. Drain Example up up $ $ up up $ $ X M link failure

  9. Drain Example up up $ $ node isolated! up up $ $ M

  10. Drain Example up up $ $ drain connected nodes via primary links up up $ $ M

  11. Drain Example up up $ $ up up $ $ drain disconnected node via emergency link M

  12. Drain Example up up $ $ up up $ $ drain connected node again M

  13. Drain Example up up $ $ resume normal operation up up $ $ M

  14. Drain Performance as Links Fail decreasing functional network size increasing emergency link time

  15. Memory Latency Before and After

  16. Conclusions • DRAIN is a lightweight recovery mechanism for CMPs • 5,000 gates per node • Recoup cache data and architectural state from disconnected nodes • Performance overhead only during recovery • ~3ms at 1GHz