Reliability study of an embedded operating system for industrial applications
1 / 33

Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain - PowerPoint PPT Presentation

  • Uploaded on

Reliability study of an embedded operating system for industrial applications Pardo, J., Campelo, J.C, Serrano, J.J. Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain. Research Objectives.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain' - oksana

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Juan pardo fault tolerant systems group polytechnic university of valencia spain

Reliability study of an embedded operating system for industrial applicationsPardo, J., Campelo, J.C, Serrano, J.J.

Juan Pardo

Fault Tolerant Systems GroupPolytechnic University of Valencia Spain

Research objectives
Research Objectives industrial applications

  • Critical industrial applications or fault tolerant applications need for operating systems (OS) which guarantee a correct and safe behaviour despite the appearance of errors.

  • In order to validate the behaviour of an operating system in front of errors, software fault injection techniques can be used.

  • These techniques can be used to corrupt the information of some of the operating system calls to see how the system react in front of invalid or corrupted values at the kernel calls.

WSRS '04

Research objectives1
Research Objectives industrial applications

  • The research work presented is about the development and results on software fault injection in an embedded system composed by a Real-Time Operating System (RTOS) and a microcontroller.

  • A software fault injection tool has been developed. The methodology proposed treated the operating system as a black-box where its source code was not available.

  • With this objective a layer between the operating system and the application to be executed has been developed.

  • OS errordetection coverage has been measured and observations about OS critical data structures to be improved have been commented, in order to improve the final robustness of the operating system.

WSRS '04

Introduction industrial applications

  • Software of computer systems involves a lot of aspects of our lives. Despite their enormous expansion, they are still far from reaching the perfection.

  • In order to measure the quality of the software some tests are required.

  • Fault tolerance deals with software’s ability to hide problems, specifically the effects of faults [Voas98].

  • Robustness is the degree to which a system operates correctly in the presence of exceptional inputs or stressful environmental conditions.

  • Robustness can thus be viewed as an indication on the OS capacity to resist/react to faults induced by the applications running on top of it, or originating from the hardware layer or from device drivers [DBench02].

WSRS '04

Introduction industrial applications

  • Fault Tolerant System

    • Fault tolerance is intended to preserve the delivery of correct service in the presence of active faults. It is generally implemented by error detection and subsequent system recovery

    • A system able to continue working although the appearance of errors

    • Safe behaviour known state which doesn’t produce any risk to the system

  • Dependability

    • To avoid the lost of human lives or important economic quantities

    • Final products quality  Validation before to go to the market

  • WSRS '04

    Introduction industrial applications


    Dependability of a computing system is the ability to deliver

    service that can justifiably be trusted

    A. Avizienis

    JC. Laprie

    B. Randell

    WSRS '04

    State of art
    State of art industrial applications

    Fault Injection


    WSRS '04

    Advantages drawbacks swifi
    Advantages & drawbacks (SWIFI ) industrial applications

    • Total control on When and Where to inject  Controllability

    • Higher level faults simulation

    • Reduced cost

    • Higher reachability

    • Higher portability  Flexibility

    • Low risk to damage the circuit under tests

    • Easy automation of the injection campaigns

    • Good observability everyday processors have more internal tools for debugging

    WSRS '04

    Advantages drawbacks swifi1
    Advantages & drawbacks (SWIFI ) industrial applications

    • There are zones which SW can not reach.

    • Less precision on timing measurements  interferences with the system, overload, etc.

    • Injection and activation agents overload the system

    • Runtime Injection  Little intrusion

      • Objective: minimize the overload

        • Drawback for RTOS

      • Easy automation of injections campaigns

    • Pre-runtime  Less intrusion

    WSRS '04

    Sw fault injection
    SW Fault Injection industrial applications

    • SW Fault Injection tools:

      • FIAT:Fault Injection Based Automated Testing Environment, Carnegie Mellon University.

      • EFI, PROFI:Processor Fault Injector, Dortmund University.

      • FERRARI: Fault and ERRor Automatic Real-time Injector, Texas University.

      • SFI, DOCTOR:intergrateD sOftware implemented fault injeCTiOn enviRonment, Michigan University.

      • FINE:Fault Injection and moNitoring Environment, Universidad de Illinois University.

      • FTAPE:Fault Tolerance and Performance Evaluator, Illinois University.

      • XCEPTION: Coimbra University.

      • MAFALDA, MAFALDA-RT:Microkernel Assessment by Fault injection AnaLysis and Design Aid, LAAS-CNRS en Toulouse

      • BALLISTA: Carnegie Mellon University.

    WSRS '04


    XRAM industrial applications



























    • MicroC/OS-II RTOS

    • Infineon C166  Microcontroller

    • Tasking  Compiler, Debugger..

    • Infineon Microcontroller Characteristics:

      • 16 bits High performance

      • On-chip CMOS

      • 16.5 MIPS, 25/33 MHz

      • Advantages from CISC & RISC

      • High functionality for peripheral

      • Typical for automotive

    WSRS '04

    Cots components
    COTS components industrial applications

    • The main motivation to use Commercial Off-The-Shelf (COTS) components on a system design is the notorious cost reduction associated to the final product development.

    • The use of COTS components becomes a cost-effective method for rapid prototyping of complex software systems.

    • On the other hand, the use of COTS software components have serious certification problems due to their design process is unknown.

    WSRS '04

    Cots components1
    COTS components industrial applications

    • COTS software is composed of general purpose components which have poor dependability specifications.

    • Usually, COTS components are like a black-box, the source code is not available and their internal architecture (structure and data flow) is not adequately documented.

    WSRS '04

    C os ii operating system
    µC/OS-II Operating System industrial applications

    • Selection came motivated from the perspective that it is a system widely used since several years ago.

      First Version MicroC/OS 1992

    • Industrial robots, motor control, medical instruments, etc.

    • It is 99% compliant with the Motor Industry Software Reliability Association (MISRA) C Coding Standards.

    • All Modified Condition Decision Coverage (MCDC) code in MicroC/OS-II has been removed, improving code quality for RTCA / EUROCAE DO-178B Level A-certified environments for avionics applications.

    Validated Software Comp.

    WSRS '04

    C os ii characteristics
    µC/OS-II: Characteristics industrial applications

    • Portable:uC/OS-II is written in highly portable ANSI C, with target microprocessor-specific code written in assembly language.

    • ROMable:was designed for embedded applications. This means that if you have the proper tool chain (i.e., C compiler, assembler, and linker/locator), you can embed uC/OS-II as part of a product.

    • Scalable:it’s possible to use only the services needed in the application. This allows to reduce the amount of memory (both RAM and ROM) needed. Scalability is accomplished with the use of conditional compilation.

    • Preemptive: uC/OS-II is a fully preemptive real-time kernel. This means that uC/OS-II always runs the highest priority task that is ready.

    • Multitasking:uC/OS-II can manage up to 64 tasks; however, the current version of the software reserves eight of these tasks for system use. This leaves your application up to 56 tasks. Each task has a unique priority assigned to it, which means that uC/OS-II cannot do round-robin scheduling.

    Jean J. Labrosse

    WSRS '04

    C os ii characteristics1
    µC/OS-II: Characteristics industrial applications

    • Deterministic:Execution time of all uC/OS-II functions and services are deterministic. You can always know how much time uC/OS-II will take to execute a function or a service. Further­more execution time of all uC/OS-II services do not depend on the number of tasks running in your application.

    • Task Stacks:Each task requires its own stack; uC/OS-II allows each task to have a different stack size. This allows you to reduce the amount of RAM needed in your application.

    • Services:system services such as mailboxes, queues, semaphores, fixed-sized memory partitions, time-related functions, etc.

    • Interrupt Management:Interrupts can suspend the execution of a task. If a higher priority task is awakened as a result of the interrupt, the highest priority task will run as soon as all nested interrupts complete. Interrupts can be nested up to 255 levels deep.

    • Robust and Reliable:uC/OS-II is based on uC/OS, which has been used in hundreds of commercial applications since 1992.

    Jean J. Labrosse

    WSRS '04

    Black box approach
    Black-box approach industrial applications

    • The aim of study was to use a black-box approach for the OS study.

    • So the OS source code was not modified trying to avoid as maximum as possible an intrusion in the OS behaviour.

    • With this objective, a layer named as Meta-Kernel, had been developed between the OS and the application to be executed.

    • Through this layer the fault injection was realized in any of the parameters of the system calls to measure the OS robustness.

    • In black-box testing, input is fed into a program and the output is checked. What goes on inside the program (the black-box) is unimportant. (Voas98)


    WSRS '04

    System design
    System Design industrial applications

    • MicroC/OS-II OS

       Black-Box

    • OS Source Code not modified

    • Injector  Layer between the OS and the application

    • Injection on the parameters of system calls

    WSRS '04

    Injector attributes
    Injector Attributes industrial applications

    • Injector Attributes:

    • Prediction, elimination

    • Pre-runtime & Runtime

    • High Level

    • Transient faults

    • Changing of one bit at the system calls (Bit-Flip)

    • One fault injected each exp.

    • Workload for tool testing


    WSRS '04

    Workload design
    Workload Design industrial applications

    • Characteristics:

    • Maximum system calls consume

    • System calls of synchronization, semaphores, memory, queues, messages, tasks handling, Timing management, etc.

    • Open module to include calculus.

    • Workload for testing the injection tool and the OS

    WSRS '04

    Workload design1
    Workload Design industrial applications

    • The system workload was continuously running and consisted of a series of tasks executing the application.

    • On the other hand, an injection agent developed was in charge of injecting faults and invalid values at the kernel calls in order to monitor the system robustness.

    WSRS '04

    Errors classification
    Errors Classification industrial applications

    • Errors which could affect the system

    • Classification related to the detection mechanisms

    • Measures about error detection coverage and latency times

    After the Fault Injection 

    WSRS '04

    Injection model
    Injection Model industrial applications

    • Thefaultloadis the most critical dimension of an OS benchmark and more generally of any dependability benchmark.

    • Two techniques for system call parameter corruption could be used: the ‘bit-fliptechnique’ consisting in flipping systematically bits of the target parameters

    • and the ‘selective substitution technique’ when invalid data values are introduced in the system call parameters.

    • Studies have demonstrated the equivalence of the errors provoked by the two techniques [Dbench02].

    WSRS '04

    Injection model1
    Injection Model industrial applications

    • BIT-FLIP technique

    • It is randomly chosen on runtime:

      • System call

      • Parameter

      • Bit

  • Consequence of physical faults

    • EMI interferences

    • Noise

    • Hardware faults

    • ...

  • WSRS '04

    Analysis of the obtained results
    Analysis of the obtained results industrial applications

    • Codification of the different output values:

    • D0: No error, correct output (the fault injection didn’t affect the system).

    • D1: Error detected by the operating system (µC/OS-II error code).

    • D2: Error detected by the application (the application result was no correct).

    • D3: Error which produced the system hangs. (System failure)

    • D4: Error detected by the microcontroller.

    WSRS '04

    Analysis of the obtained results1
    Analysis of the obtained results industrial applications


    [Powell95, Constantinescu95]

    Complete System (µC/OS-II + Micro):

    C cs = D0 + D1 + D2 + D4 = 65,7 + 21 + 2 + 2,5 = 91,2 %

    Operating System ( µC/OS-II ):

    C OS = D0 + D1 =86,7 %

    WSRS '04

    Analysis of the obtained results2
    Analysis of the obtained results industrial applications

    • Error detection latencies

      • Time between the injection and detection by the OS

      • Mean value obtained 304 μs

      • One built-in timer of the microcontroller to measure latencies

        • High precision

    WSRS '04

    Other results
    Other Results industrial applications

    ‘E1’ was the most typical. This error is the ‘OS_ERR_EVENT_TYPE’. This error was produced when the fault was injected in some semaphore, message queue or mailbox. The system reacted going to a hanging state.

    Secondly, the error code ‘E42’ related with the ‘OS_PRIO_INVALID’ was obtained when the injection was at system calls about task management.

    Frequency tables about the most typical error codes given by the OS

    WSRS '04

    Other results1
    Other Results industrial applications

    Moreover, after the injection campaigns it was possible to see how errors were propagated through the system. It was registered the corrupted system call and later which was the system call who finally detected the error, taking the time employed for the system to detect this situation.

    Error Propagation

    WSRS '04

    Other results2
    Other Results industrial applications

    • To finish, results on which were the most critical system calls were obtained with the aim to improve their robustness and of course the final OS dependability.

    • For example, there are some data structures, related with the event control block, in which the injection produced a lot of failures and the most of times the system hanged.

    • This is due to in these structures is stored the list of tasks waiting for some event, so if the injection corrupts that information, the system loss the sequence of the next actions and goes to a non safe state without knowing how to react (the system hangs).

    • This give us information on where dedicate special attention due to an error on those data structures could provoke critical failures on the system.

    WSRS '04

    Conclusions industrial applications

    • After the experiments, the error detection coverage, error detection latency times, error propagation, typical OS error codes, etc. have been obtained.

    • Fault injection into the code and data memory segments of the microkernel will be implemented too.

    • About possible improvements for the MicroC/OS-II to increase its dependability should take into account, that some detected errors in certain data structures could provoke critical failures on the system.

    • These detected data structures should implement some mechanism to protect the information they host.

    WSRS '04

    Future research
    Future Research industrial applications

    • In a next research work, these data have to be compared with other COTS RTOS working under the same conditions.

    • RT-fault injector to minimize intrusion (Without internal debug support, intrusion > 0)

      • Nexus-implemented fault injection

        • Other architecture: Motorola MPC565

        • Intrusion -----> null

        • Preliminary results

        • Better controllability and observability

        • Best option to validate RTOS and applications

    WSRS '04

    Contact data
    Contact Data industrial applications

    Juan Pardo

    Fault Tolerant Systems Group

    Polytechnic University of Valencia




    WSRS '04