System-Directed Resilience for Exascale Platforms. LDRD Proposal 09-0016. System-Directed Resilience for Exascale Platforms (09-0016) Ron Oldfield (1423), Neil Pundit (1423), FY09-11, Total $1500 Costs. Problem Current apps cannot survive a node failure
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
LDRD Proposal 09-0016
Goal: Develop methods to suspend application activity without hindering progress of other applications
Goal: Efficient methods for extracting and managing state
Goal: Dynamically recover a failed node without restarting the whole application
Our approach represents a fundamental
change in how systems support resilience