fault tolerant design for long life deep space missions l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Fault-Tolerant Design for Long-Life Deep Space Missions PowerPoint Presentation
Download Presentation
Fault-Tolerant Design for Long-Life Deep Space Missions

Loading in 2 Seconds...

play fullscreen
1 / 29

Fault-Tolerant Design for Long-Life Deep Space Missions - PowerPoint PPT Presentation


  • 441 Views
  • Uploaded on

Fault-Tolerant Design for Long-Life Deep Space Missions Yiğit Kültür 2006702835 Contents Introduction Fault-Tolerant System Considerations and Techniques Historical Perspective Future Approach Conclusion Introduction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Fault-Tolerant Design for Long-Life Deep Space Missions' - Thomas


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
contents
Contents
  • Introduction
  • Fault-Tolerant System Considerations and Techniques
  • Historical Perspective
  • Future Approach
  • Conclusion
introduction
Introduction
  • Recently, planet Mars has been at the focal point of astronomical attention because Mars will play a key role in humanity’s expansion to the deep space
  • Future Mars transportation will require reliable operations over a lifespan of years unlike:
    • Space Shuttle which requires operations over months
    • Space Station which is close enough to the Earth for maintenance logistics
introduction4
Introduction
  • Long operation period associated with deep space missions demands:
    • Innovative fault-tolerant technology development
    • Applications of advanced redundancy techniques
  • To enable Mars exploration safety, reliability and autonomy must be improved
  • A new technology plan to guide the development of the next generation fault tolerant computing technology
fault tolerant system considerations
Fault Tolerant System Considerations
  • Traditionally, avionic systems achieved fault-tolerance through redundancy management
  • Redundancy management technique:
    • Detects and isolates a failure
    • Performs hardware roconfiguration
  • A combination of self-monitoring and cross-comparison strategies lead to comprehensive fault coverage at reduced risk and cost
fault tolerant system considerations6
Fault Tolerant System Considerations
  • Primary Flight Control System (PFCS) Baseline Requirements
    • Mission reliability: 0.95 success probability at 10 years with no repair
    • Throughput: 100 million instructions per second (MIPS)
    • Expandable I/O: 100 Mbits/sec
    • Expandable Memory: 1 GByte
    • Mass Storage Capacity: 1 Terabyte
    • Cycle Rate: 100 Hz
    • Hardware N-fail operation
    • Low life-cycle cost
    • Low power and mass
    • Radiation tolerance
    • Building block approach(Look for existing soultions to the parts of the problem and combine the soluitons)
fault tolerant techniques for mars applications
Fault Tolerant Techniques for Mars Applications
  • Ultra-reliable systems for long-life applications like human Mars exploration are required to sustain:
    • Permanent faults
    • Transient (temporary) faults
    • Intermittent (not continuous) faults
    • Timing faults
    • Latent (hidden) faults
    • Worst-case fault scenarios with a lower probability of occurence
fault tolerant techniques for mars applications8
Fault Tolerant Techniques for Mars Applications
  • Distributed Architectures are more suitable to long-life space applications:
    • Function integration
    • Parallel computation
    • Graceful performance growth
    • Selective technology upgrade
    • Appropriate levels of function reliability
    • Graceful degradation of system capabilities in the presence of faults
    • Efficient use of hardware resources
historical perspective
Historical Perspective
  • Long-Life Unmanned Redundant Systems

Viking

Voyager

Galileo

historical perspective10
Historical Perspective
  • Safety Critical High Reliability Systems

Columbia Challenger Discovery Atlantis Endeavour

long life unmanned redundant systems viking
Long-Life Unmanned Redundant SystemsViking
  • Viking is an instance of the pre-1970 Thermoelectric Outer Planets Spacecraft (TOPS) concept
  • This spacecraft firstly introduced the use of computer as a fault manager, to attempt to reconfigure and restore the spacecraft to an operational configuration
  • Fundamental strategy was to switch power on and off to various alternative subsystems until either the built-in fault monitoring indicated operation was restored, or until commands from the Earth are detected in the case of faults in the communication chain
  • There was no real-time masking of faults, so if a fault occured during a maneuver, an incorrect maneuver would have been performed

Viking Fault-Tolerant Architecture

CCS: Command Computer Subsystem

FDS: Flight Data Subsytem

long life unmanned redundant systems voyager
Long-Life Unmanned Redundant SystemsVoyager
  • Like Viking, Voyager is an instance of the pre-1970 Thermoelectric Outer Planets Spacecraft (TOPS) concept.
  • The improvement according to Viking is in only limited ways, such as the addition of a pair of seperate computers for the attitude and articulation control
  • In both of them standby redundancy was used. The standby spares where cross-strapped so that either unit could be switched in to communicate with the other units
  • Cross-strapping and switching allowed reconfiguration around failed components, either automatically or by the ground command

Voyager Fault-Tolerant Architecture

CCS: Command Computer Subsystem

FDS: Flight Data Subsytem

AACS: Attitude and ArticulationControl Subsystem

long life unmanned redundant systems galileo
Long-Life Unmanned Redundant SystemsGalileo
  • Galileo mission is a follow on to the Voyager Jupiter fly-by mission
  • Galileo design borrows heavily from the experiences of the Voyager
  • Block redundancy (An error checking method that generates a longitudal parity byte from a specified string or block of bytes on alongitudinal track.) is used throughout the subsystems
  • All except CDS operates as an active/standby pair
  • CDS operates as active redundancy wherein each block can issue independent commands, or they can operate in parallel on the same critical activity

Galileo Fault-Tolerant Architecture

CDS: Command and Data Subsystem

AACS: Attitude and ArticulationControl Subsystem

long life unmanned redundant systems galileo14
Long-Life Unmanned Redundant SystemsGalileo
  • The major departure from the Voyager arcihtecture is the extensive use of microprocessors and the consequent use of bus oriented architecture to facilitate communications among them
  • Galileo on-board fault detection software is designed to alleviate the effects and symptoms of faults, rather than to pinpoint the exact faults.
  • Fault identification and isolation are performed by the ground intervention

Galileo Fault-Tolerant Architecture

CDS: Command and Data Subsystem

AACS: Attitude and ArticulationControl Subsystem

safety critical high reliability systems shuttles
Safety Critical High Reliability SystemsShuttles
  • Operational differences from planetary probes:
    • being absolutely certain no fault propagates to the effectors during a relatively shorter operation cycle
    • rather than relying on fault monitors to interrupt processing and going through a reconfiguration, powering several redundant strings on and operating in parallel
safety critical high reliability systems shuttles16
Safety Critical High Reliability SystemsShuttles
  • Voting occurs both in General Purpose Computers (GPC’s) and at the final effectors
  • Voting is much more brute force than fault moitoring, requiring more hardware but also providing greater fault coverage
  • Much more suited to real-time safety-critical maneuver control than a reconfiguration oriented strategy as in Viking, Voyager and Galileo

Conceptual Shuttle Orbiter Fault-Tolerant Architecture

GPC: General Purpose Computer

mars advanced fault tolerant computing approach future manned mars missions
Mars Advanced Fault Tolerant Computing ApproachFuture Manned Mars Missions
  • Parallel-Hybrid Redundancy will be the base for future long-life deep space missions:
    • It combines the attractive features of parallel processing and redundant computation
    • Computational elements can be arranged to provide high throughput or ultra reliability or a combination of them depending on the mission phase
mars advanced fault tolerant computing approach future manned mars missions18
Mars Advanced Fault Tolerant Computing ApproachFuture Manned Mars Missions
  • Parallel-Hybrid Redundancy was first used in 1979 when Fault Tolerant Multi-Processor (FTMP) was designed and built:
    • FTMP used conventional shared memory multiprocessor architecture
    • Each virtual processor consisted of three real processors working as a triad to provide real-time fault masking
    • Upon detection of a fault in a processor, faulty unit is replaced from a pool of spares
mars advanced fault tolerant computing approach future manned mars missions19
Mars Advanced Fault Tolerant Computing ApproachFuture Manned Mars Missions
  • Parallel-Hybrid Redundancy had certain drawbacks:
    • It was not explicitly designed to meet rigorous requirements of Byzantine resilience (Correctly functioning components of a Byzantine fault tolerant system will be able to reach the same group decisions regardless of Byzantine faulty components ) which is necessary to provide
      • Coverage of random hardware faults
      • Ultra-high reliability
      • Ease of validation
    • It lacked ease of expandability due to redundant bus connections between processors and main memory
    • It did not support mixed redundancy because processors are aranged to work in triads regardless of the criticality of the application
mars advanced fault tolerant computing approach future manned mars missions20
Mars Advanced Fault Tolerant Computing ApproachFuture Manned Mars Missions
  • To solve the deficiencies of FTMP a new architecture called Fault Tolerant Parallel Processor (FTPP) was conceived
  • It meets all requirements of random hardware faults
  • FTPP will be the base of fault tolerance for future manned Mars missions

FTPP Arcihtecture

mars advanced fault tolerant computing approach features of ftpp parallel procesing
Mars Advanced Fault Tolerant Computing ApproachFeatures of FTPP – Parallel Procesing
  • Parallel Processing is provided by:
    • 40 Processing Elements (PEs) in 5 Fault Containment Regions (FCRs)
    • 2 Input/Output Controllers (IOCs) per FCR

FTPP Arcihtecture

mars advanced fault tolerant computing approach features of ftpp scalable performance
Mars Advanced Fault Tolerant Computing ApproachFeatures of FTPP – Scalable Performance
  • Increasing the number of PEs in a single cluster create a communication bottleneck in the Network Elements (NEs)
  • FTPP relies on hierarchical approach to scaling the performance by assebmling clusters via IOCs

FTPP Arcihtecture

mars advanced fault tolerant computing approach features of ftpp mixed redundancy
Mars Advanced Fault Tolerant Computing ApproachFeatures of FTPP – Mixed Redundancy
  • Most fault tolerant computers are designed to operate in a redundant mode only, which is a waste of resources for the uncritical tasks
  • FTPP allows the processing elements to be configured as
    • Simplex:non-critical tasks
    • Triplex:tasks that require real-time fault masking
    • Quadruplex or higher: when two or moresequential faults must be tolerated in a small time window without the benefit of reconfiguration
  • In the figure:
    • 4 quads
    • 3 triplexes
    • 15 simplexes

FTPP Arcihtecture

mars advanced fault tolerant computing approach features of ftpp dynamic reconfiguration
Mars Advanced Fault Tolerant Computing ApproachFeatures of FTPP – Dynamic Reconfiguration
  • Mission consists of several phases such as launch, ascent, cruise from Earth orbit to Mars, Mars orbit injection, Mars landing
  • For each phase the throughput, latency, iteration rates and criticality changes over a wide range, therefore the arcihecture must be flexible
  • Reconfiguration from high throughput to high reliability
    • 3 PEs which are operating as independent simplex elements can be synchronized to run the same task (S2,S3,S13)
  • Replacing failed members
    • A simplex in the same FCR as the failed member is synchronized with the non-failed members of the virtual group(Channel A of Q1 failsS2,S7 or S12 can replace)

FTPP Arcihtecture

mars advanced fault tolerant computing approach features of ftpp low fault tolerance overhead
Mars Advanced Fault Tolerant Computing ApproachFeatures of FTPP – Low Fault Tolerance Overhead
  • Frequent fault tolerant related functions such as fault/error detection, error masking(voting) and synchronization are implemented in the Network Element
  • Less frequent functions such as identification of faulty modules, reconfiguration and reintegration are implemented in software which executes on PEs.
  • Each NE services 8 PEs

FTPP Arcihtecture

mars advanced fault tolerant computing approach features of ftpp open architecture
Mars Advanced Fault Tolerant Computing ApproachFeatures of FTPP – Open Architecture
  • FTTP provides open architecture for both hardware and software including:
    • Processors
    • I/O modules
    • Fiber optic links
    • Operating Systems

FTPP Arcihtecture

mars advanced fault tolerant computing approach features of ftpp small physical size
Mars Advanced Fault Tolerant Computing ApproachFeatures of FTPP – Small Physical Size
  • Key element of meeting the weight, volume and power requirements is the packaging technology
  • Multi-Chip Modules (MCMs) will be used:
    • A NE on a single MCM with less than 4 cm2

FTPP Arcihtecture

conclusion
Conclusion
  • Future manned deep space missions will require reliable operation over years and real-time masking of critical faults
  • Current approaches are not enough and a new fault tolerant approach is needed
  • FTPP is a powerful candidate for the spacecraft which will bring the humans to Mars
references
References
  • Advanced fault tolerant computing for future manned space missionsBenjamin, A.L.; Lala, J.H.;Digital Avionics Systems Conference, 1997. 16th DASC., AIAA/IEEEVolume 2,  26-30 Oct. 1997 Page(s):8.5 - 26-8.5-32 vol.2
  • NASA Website

Computers in Spaceflight: The NASA Experiencehttp://www.hq.nasa.gov/office/pao/History/computers/Ch6-2.html

  • NASA Jet Propulison Laboratory Website

Voyager: The Interstellar Mission

http://voyager.jpl.nasa.gov/spacecraft/index.html