system reliability 3 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
System Reliability - 3 PowerPoint Presentation
Download Presentation
System Reliability - 3

Loading in 2 Seconds...

play fullscreen
1 / 14

System Reliability - 3 - PowerPoint PPT Presentation


  • 319 Views
  • Uploaded on

System Reliability - 3. Processor Problems Power-on aspects Run-time issues Hardware-based Fault Tolerance Fault-tolerant structures Matching structures to requirements. Processor Problems. Techniques to answer these questions: Is it OK to use the processor system at all?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

System Reliability - 3


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. System Reliability - 3 Processor Problems Power-on aspects Run-time issues Hardware-based Fault Tolerance Fault-tolerant structures Matching structures to requirements

    2. Processor Problems • Techniques to answer these questions: • Is it OK to use the processor system at all? • Is it safe to continue using it in its normal running mode? • If problems are detected, can we recover? • We want as far as possible graceful recovery from failures • Power-on Aspects • CPU Tests • Verify processor correctly executes its range of instructions • Ensure condition flags can be set and read correctly • Check on-chip registers etc can be written and read correctly • Verify numeric coprocessors • You can write your own code to perform these test insofar as applicable to your application code

    3. Processor Problems • Power-on Aspects • ROM Tests • ROM contents could become corrupted by EM pulses / radiation • Flash can be “accidentally” reprogrammed • Write tests to check for validity on power-up • Run-time Issues • Stack overflow • Inhibit interrupts until current one is serviced (not always feasible) • Service the interruptr before the next arrives (not always possible) • Monitor with software • Hardware-based monitoring (eg ARM)

    4. Processor Problems • Run-time Issues • Corruption of critical variables • Causes by electrical noise, power supply fluctuations, … • Use check summing • Make a (check summed copy), compare with original • Make 3 copies, use majority voting • Instruction pointer corruption • PC may point to unprogrammed locations • Fill with No-op codes • or it may point to programmed memory locations • More difficult to deal with

    5. Hardware-based Fault Tolerance operational requirements continuous operation required non-continuous operation acceptable reduced performance acceptable must be returned quickly to full service full performance needed stops & locks out safely fail-operational high-availability fail-active fail-safe no repair possible automated repair permissible automatic re-config manual repair without re-config manual repair with auto re-config manual repair with manual recovery Automatic reconfig with automatic recovery

    6. Hardware-based Fault Tolerance • Fault-tolerant structures • Passive redundant systems Sensor 1 Processor 1 Processor 2 Majority voter Control actuator Processor 3 Sensor 2 Processor 4

    7. Hardware-based Fault Tolerance • Fault-tolerant structures • Active redundant systems • Standby processor could be running “cold” or “hot”, in the latter case perhaps fully synchronised with the active processor Processor 1 (active) Processor 3 (fault detection) Sensors Processor 2 (standby) output Bus switch

    8. Hardware-based Fault Tolerance • Fault-tolerant structures • Hybrid redundant systems Majority voter Processor 1 (active) Processor 2 (active) Sensor 1 Switching circuit Bus switch Processor 3 (active) output Processor 4 (spare)

    9. Hardware-based Fault Tolerance • Fault-tolerant structures • Very high availability systems • Active redundant structures • DB servers etc • Very fast “failover” times • Ready availability of disk data to standby systems • Hot-swap systems • Replacement of boards while system is on-line, without disturbing operations • CompactPCI standard: four levels of capability • Basic hot-swap • Full hotswap • High-availability hot-swap (single processor) • High-availability hot-swap (multiple processor)

    10. Hardware-based Fault Tolerance • Fault-tolerant structures • CompactPCI - Basic hot-swap Devices, networks Host Processor single board computer (SBC) Hot-swap I/O boards PCI bus

    11. Hardware-based Fault Tolerance • Fault-tolerant structures • CompactPCI - Full & high-availability (single processor) hot-swap • Full: interface to system software allowing it to oversee swap of boards • HA: system monitors/controls boards with platform management software Platform management controller SBC Devices, networks Hot-swap I/O boards PCI bus management bus Host Processor SBC

    12. Hardware-based Fault Tolerance • Fault-tolerant structures • CompactPCI - Full & high-availability (multiple processor) hot-swap • Distributed computing: • load balancing • Re-allocation of work from a failed processor • Fault diagnosis, repair managed by the host SBC Primary (host) SBC Devices, networks Dedicated I/O controller SBC 1 Devices, networks SBC 2 PCI bus SBC n

    13. Hardware-based Fault Tolerance operational requirements continuous operation required non-continuous operation acceptable reduced performance acceptable must be returned quickly to full service full performance needed stops & locks out safely fail-operational high-availability fail-active fail-safe no repair possible 1 automated repair permissible 2 automatic re-config 3 manual repair without re-config 4 manual repair with auto re-config 5 manual repair with manual recovery 6 Automatic reconfig with automatic recovery 7

    14. Hardware-based Fault Tolerance Matching structures to Requirements