1 / 15

Authors: Xue Liu, Hui Ding, Kihwal Lee, Qixin Wang, Lui Sha

ORTEGA: An Efficient and Flexible Software Fault Tolerance Architecture for Real-Time Control Systems. Authors: Xue Liu, Hui Ding, Kihwal Lee, Qixin Wang, Lui Sha. Presentation by Evan Frenn . Overview. Introduction Software Faults General Assumptions Related Work ORTEGA Limitations

marla
Download Presentation

Authors: Xue Liu, Hui Ding, Kihwal Lee, Qixin Wang, Lui Sha

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ORTEGA: An Efficient and Flexible Software Fault Tolerance Architecture for Real-Time Control Systems Authors: Xue Liu, Hui Ding, Kihwal Lee, Qixin Wang, Lui Sha Presentation by Evan Frenn

  2. Overview Introduction Software Faults General Assumptions Related Work ORTEGA Limitations Design challenges/solutions Evaluation Conclusion

  3. Introduction Problem: How to design fault tolerant architectures for real-time control systems Examples of real-time control systems? What are faults? Hardware malfunction Communication Medium malfunction Software Malfunction

  4. Software Faults Resource sharing faults  Corruption of memory Handled by address space protections Time Faults Failure to meet timing constraints (e.g. infinite loop) Handled by a real-time scheduling method e.g. Generalized Rate-Monotonic Scheduling Semantic Faults producing the wrong output Handled by utilizing a high assurance controller (HAC) Assumption - HAC is always correct

  5. General Assumptions Authors assume existence of two distinct controllers: High Assurance Controller (HAC) – proven to be reliable based. Relies on its simple construction allowing formal methods for verification and validation High Performance Controller (HPC) – use advanced control techniques for higher performance Additional features or more complex control structure (e.g. neural networks) Is it common to have choice of controllers?

  6. Related Work Simplex Utilizes HAC and HPC running in parallel to allow rapid response to an HPC fault Limitations: Inefficient: HAC is always running, even when faults are not present Inflexible: HAC and HPC are required to have the same sampling/control periods Does not allow HAC to make up time incurred by the fault

  7. ORTEGA On-demand Real-TimE GuArd (ORTEGA) – 3 major components: Decision module – determines which control command to use for each period Simply uses semaphore to lock suspended control module Allows HPC to run during normal operation HPC module HAC module

  8. ORTEGA ctd. Comparison to Simplex: Decision module allows efficient CPU utilization Decisions structure removes requirement for HAC and HPC to be lock stepped CPU Usage Savings

  9. Limitations On-demand functionality of ORTEGA leads to a single period delay in the recovery procedure Can be overcome using a state projection technique – requires projection of next state of the plant… How? Ability of ORTEGA to dynamically change period of HAC minimizes delay

  10. Design Challenges Maximum Stability Region ORTEGA requires that the HPC always state within the stable region that can be handled by the HAC If fault occurs in HPC outside HAC’s stability region, HAC will be unable to recover the system In order to reduce restrictions on the HPC, the stability region of the HAC must be maximized

  11. Design Solution Looked at constraining the next plant state based on its current state and the current control output Use Lyapunov stability criteria to calculate stability region given state constraints– output is an ellipsoid of the stable state Stable state ellipsoid is then converted using Linear Matrix Inequality Prove state of controller can never leave stability region, provided it starts in the region Explained using stable state of an inverted pendulum

  12. Evaluation Evaluated ORTEGA under control of an inverted pendulum Stability region of device measured by angle of the pendulum Tested 2 configurations Non-faulty HPC and HAC – used as base test against Simplex for CPU saving Non-faulty HAC and faulty HPC – tested ORTEGA’s ability to control system

  13. Evaluated Bugs Infinite loop Non performing bug – HPC crashes and outputs zero Maximum control output – HPC faults to outputting maximum value Bang-bang – HPC faults to output maximum value then minimum value Positive feedback control – HPC outputs opposite of correct values Divide by zero

  14. Conclusion Evaluation results: ORTEGA saves 30% of CPU resources when HPC and HAC have same period over Simplex ORTEGA saves up to 50% when sampling rate is dynamic ORTEGA tolerates all faults – True? How does this apply to plant control systems? Faults that are not tested? Instances where delay matters?

  15. Thank You!

More Related