1 / 16

Failure in the PATHFINDER Mission

Failure in the PATHFINDER Mission. Chandan Kumar EE 585: Fault Tolerant Computing. Outline. Background Simplified view of H/W architecture S/W architecture Failure Cause Correction. Background. Launched Dec 4 1996 Landed July 4 1997. Mission Objectives:

amberwhite
Download Presentation

Failure in the PATHFINDER Mission

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Failure in the PATHFINDER Mission Chandan Kumar EE 585: Fault Tolerant Computing

  2. Outline • Background • Simplified view of H/W architecture • S/W architecture • Failure • Cause • Correction EE 585: Case Study

  3. Background • Launched Dec 4 1996 • Landed July 4 1997. Mission Objectives: • To prove that the development of "faster, better and cheaper" spacecraft is possible (with three years for development and a cost under US$ 150 million). • To show that it is possible to send a load of scientific instruments to another planet with a simple system and at one fifth the cost of a Viking mission. EE 585: Case Study

  4. Background Contd. • To demonstrate NASA's commitment to low-cost planetary exploration finishing the mission with a total expenditure of US$ 280 million, including the launch vehicle and mission operations. • Demonstrate the mobility and usefulness of a micro rover on the surface of Mars • It carried a number of scientific instruments like Mars Pathfinder Lander: • Imager for Mars Pathfinder (IMP),(includes magnetometer and anemometer) • Atmospheric and meteorological sensors (ASI/MET) EE 585: Case Study

  5. Background Contd. Rover Sojourner: • Imaging system (three cameras: front B&W stereo, 1 rear color) • Laser striper hazard detection system • Alpha Proton X-raySpectrometer (APXS) • Wheel Abrasion Experiment • Material Adherence Experiment • Accelerometers • Potentiometers • Final transmission Sept 27 1997. • 16500 images sent from lander,550 from rover • 15 analysis of rocks. EE 585: Case Study

  6. Simplified view of Hardware Architecture • Single CPU – Controls the Spacecraft. • Resides on VME bus. • Interface cards for Radio and Camera. • Interface to 1553 bus. • 1553 bus connects to ‘cruiser’ and ‘lander’ stages. • H/W on Cruiser – controls thrusters .etc • H/W on Lander – interface to instruments like accelerometer,radar altimeter and ASI/MET etc. EE 585: Case Study

  7. The Software Architecture |< ------------------------ .125 seconds ---------------------------->| |<***************| |********| |**>| |<- bc_dist active ->| bc_sched active | < - bus active - >| |<->| ----|-------------------------|-------------------------|------------|-----|----|--- t1 t2 t3 t4 t5 t1 The *** are periods when tasks other than the ones listed are executing. There is some idle time. t1 - bus hardware starts via hardware control on the 8 Hz boundary. The transactions for the this cycle had been set up by the previous execution of the bc_sched task. t2 - 1553 traffic is complete and the bc_dist task is awakened.t3 - bc_dist task has completed all of the data distributiont4 - bc_sched task is awakened to setup transactions for the next cyclet5 - bc_sched activity is complete EE 585: Case Study

  8. The Failure: • The spacecraft began experiencing total system resets. • This reset reinitializes all of the hardware and software. It also terminates the execution of the current ground commanded activities. • The remainder of the activities for that day were not accomplished until the next day EE 585: Case Study

  9. The Cause • The Failure - a case of Priority Inversion • In scheduling, priority inversion is the scenario where a low priority task holds a shared resource that is required by a high priority task. • This causes the execution of the high priority task to be blocked until the low priority task has released the resource, effectively "inverting" the relative priorities of the two tasks. • If some other medium priority task attempts to run in the interim, it will take precedence over both the low priority task and the high priority task. EE 585: Case Study

  10. The Cause Contd. • The failure was identified by the spacecraft as a failure of the bc_dist task to complete its execution before the bc_sched task started • The ASI/MET task is delivered its information via an interprocess communication mechanism (IPC). • IPC mechanism based on using Pipes. • The higher priority bc_dist task was blocked by the much lower priority ASI/MET task that was holding a shared resource. EE 585: Case Study

  11. The Cause contd.. • The resource that caused this problem was a mutual exclusion semaphore used within the select() mechanism. • The ASI/MET task had acquired this resource and then been preempted by several of the medium priority tasks. • The bc_dist task attempted to send the newest ASI/MET data via the IPC mechanism which called a Pipe. This pipe blocked taking the semaphore. EE 585: Case Study

  12. The Cause contd.. • The medium priority tasks ran, still not allowing the ASI/MET task to run, until the bc_sched task was awakened. • At that point, the bc_sched task determined that the bc_dist task had not completed its cycle (a hard deadline in the system) and declared the error that initiated the reset. EE 585: Case Study

  13. Correction • Changing the creation flags for the semaphore so as to enable the priority inheritance • Modify the semaphore associated with the pipe used for bc_dist task to ASI/MET task communications corrected the problem. EE 585: Case Study

  14. S/W modification on the spacecraft • Patching is a specialised process. • Send the difference b/w what you have onboard and what you want on the spacecraft. • S/W on the spacecraft modifies the onboard copy. EE 585: Case Study

  15. Questions?? EE 585: Case Study

  16. References • http://mars.jpl.nasa.gov/missions/past/pathfinder.html • http://research.microsoft.com/%7embj/Mars_Pathfinder/Authoritative_Account.html • http://en.wikipedia.org/wiki/Mars_Pathfinder EE 585: Case Study

More Related