1 / 16

Reliability-Aware OS Support for FPGA-Based Systems

This paper discusses the importance of reliability in system design, with a focus on reliability-aware OS scheduling for FPGA-based systems. The authors propose a method that utilizes FPGA space to duplicate processes and improve reliability. The paper also covers issues in duplicating processes and introduces a quality of service (QoS) parameter to indicate the maximum tolerable performance degradation.

Download Presentation

Reliability-Aware OS Support for FPGA-Based Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering The Pennsylvania State University, USA 224/MAPLD 2004

  2. Introduction and Acronyms • Increasing soft-error rates make reliability an important factor in system design • Our focus: Reliability-aware OS scheduling for FGPA based systems • FPGA: Field Programmable Gate Array • CLB: Configurable Logic Block • STG: SubTask Graph 224/MAPLD 2004

  3. Configurable Logic Block CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB The Reconfigurable System Process 2 Process 3 Process 1 a 6X8 CLB array the interconnects and input-output blocks are omitted 224/MAPLD 2004

  4. Improving Reliability • Traditionally, OS-scheduler schedules parallel executions of multiple processes to maximize FPGA space utilization • Data dependencies between different processes might prevent the full utilization of FPGA space • Our approach utilizes the available FPGA space to duplicate processes and improve reliability 224/MAPLD 2004

  5. CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB Duplicating Processes Process 2 Duplicate of Process 1 Process 3 Process 1 Duplicate of Process 3 224/MAPLD 2004

  6. Issues in Duplicating Processes • Tasks (processes) have different criticality • Each task may require a different amount of FPGA space • Duplications can cause performance degradation • We use a QoS parameter to indicate the maximum tolerable performance degradation • A checker task is scheduled for each duplicated task to check the outputs of the primary task and the duplicate 224/MAPLD 2004

  7. Each node represents a process code portion (subtask) that will be executed in a single quantum of time once it gets scheduled. The jth node of process i is denoted as STGij Indicates a data or control dependence from vi to vj Subtask Graph (STG) Vi Vj Each process to be scheduled is presented by a subtask graph 224/MAPLD 2004

  8. Subtask Graph Vi Vj Since our processes are extracted from the same application, there might be data dependences between different processes 224/MAPLD 2004

  9. Our Approach • Five Step • Task duplication under QoS guarantees • Current implementation focuses only on error detection Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004

  10. Our Approach The application programmer indicates which data structure are critical from the reliability view point using annotations. Annotation step QoS specification step Task identification step The application programmer also indicates the tolerable latency during application execution as a result of the reliability provided. Task ranking step Scheduling step 224/MAPLD 2004

  11. Our Approach An automatic application code analyzer analyzes the source code, and identifies tasks. Annotation step QoS specification step Task identification step Based on how these tasks operate on critical data, they are ranked. They are ordered from the most important task to the least important one. Task ranking step Scheduling step 224/MAPLD 2004

  12. Our Approach Annotation step QoS specification step The OS scheduler is modified such that whenever there is opportunity, the OS duplicates tasks that run on FPGA device. Whenever the scheduler predicts the QoS limit is about to be reached, it stops duplicating the tasks. Task identification step Task ranking step Scheduling step 224/MAPLD 2004

  13. Experimental Setup • An error injection module injects errors with a specified probability • Two real-life embedded applications: encr and usonic • The performance of our reliability-aware scheduler is compared with that of a normal Short-Job-First scheduler • Tolerate at most 5% performance degradation • Rank tasks according to the frequency of accesses to critical data • Fatal errors: Errors that would lead to crash of the application 224/MAPLD 2004

  14. Experimental Data 224/MAPLD 2004

  15. Ongoing Work • Experimenting with a diverse set of benchmarks • Implementing task duplication within other types of OS schedulers such as First-Come-First-Server 224/MAPLD 2004

  16. Conclusion • The OS scheduler tries to provide reliability through task duplication under QoS guarantees • Improving FPGA space utilization by duplicating for reliability • Providing reliability for critical tasks first • Catching most fatal errors 224/MAPLD 2004

More Related