Missing in Action:
This presentation is the property of its rightful owner.
Sponsored Links
1 / 8

Center for Embedded Systems Research (CESR) Department of Computer Science PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on
  • Presentation posted in: General

Missing in Action: Timing Analysis and Soft Error Protection. Frank Mueller. Center for Embedded Systems Research (CESR) Department of Computer Science North Carolina State University. Example: A380 Overheat Detection. w/ Hamilton Sundstrand/United Techn. Overall system has 54 sensors

Download Presentation

Center for Embedded Systems Research (CESR) Department of Computer Science

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Center for embedded systems research cesr department of computer science

Missing in Action:

Timing Analysis and Soft Error Protection

Frank Mueller

Center for Embedded Systems Research (CESR)

Department of Computer Science

North Carolina State University


Example a380 overheat detection

Example: A380 Overheat Detection

  • w/ Hamilton Sundstrand/United Techn.

  • Overall system has 54 sensors

  • When too hot, isolate air channels

    • Close valves over AFDX network

  • Avoids overheating upon leakage

    • plane’s hull is hybrid carbon/metal can burn hole into it!

  • SW has to adhere to RTCA DO-178B standard

    • Level A: conditional decision, branch/decision/stmt coverage

    • Level B: branch/decision/statement coverage

    • Level C: statement coverage

  • SW is written as cyclic executives


Requirements

Requirements

  • SW standard requirements – some examples:

    • All switch statements must have a default case

    • Single entry and single exit functions only

    • Strict type checking required

  • SW certification requirements

    • Qualified tools to check for adherence to standard

    • Simulation environment for testing functionality

    • Explicit tests for every low level requirement

    • Programmer independence

    • New: Timing guarantees (required by Airbus!) worst-case execution time (WCET) analysis


Missing in action 1 timing analysis

Missing in Action 1: Timing Analysis

  • WCET: Worst-case execution time

    • needed for schedulability analysis

  • WCET bounds: determined by timing analysis

    • should be safe and tight

    • derived by tools: only semi-automated, small programs

    • restrictions: loop bounds, no heap, no func pointers

    • predictable architecture

  • Problems:

    • WCET >> actual execution time  under-utilization

    • Complexity wall:

      • timing analysis tools lagging behind architectural innovation

      • not getting closer (maybe even loosing)

  • Tools and methods lag behind  What to do?


Timing analysis status quo and needs

Timing Analysis: Status Quo and Needs

  • Capabilities of static timing analysis

    • In-order scalar pipeline, static branch prediction, split I/D $

  • Contemporary processors

    • Out-of-order, multiple issue, dynamic branch prediction, multi-level caches, deep speculation, etc.

  • Analyzability fundamental to design of safe systems

    • excludes contemporary microarchitectures

    • Long-term implications

  • Complexity wall  need new methods for timing analysis

  • Promote hybrid HW/SW solution

    • Timings on actual processor in special execution mode

    • Steer execution through SW  realistic! (ARM)

  • Rigorous methodology and tools needed!


Another failure single event upset

Another Failure: Single Event Upset

  • Radiation from space due to solar flare can cause bit flips

    • Heavy ion strikes flip-/flop, RAM, …

    • Issue in higher atmosphere  planes over flying over poles

  • Typically sufficient to consider single (bit) event upset (SEU)

    • Multiple bits statistically too rare to care for

    • Also caused by smaller fabs  smaller noise ratios  errors

  • Protect RAM w/ ECC

  • Caches/processors unprotected

    • Unless radiation hardened  expensive

  • Examples: solar flares

    • Many failed servers in 1999

    • Nozomi Mars Probe rendered inoperable

  • IBM has built-in checks for 80% of server-chip circuits


Seu on the airbus 380

SEU on the Airbus 380

  • Uses PowerPC 750CXe

    • Off-the-shelve

    • RAM has ECC

    • L2 has ECC but L1 does not

    • No protection against SEU in processor core

  • Options:

    • Do not use L1 and best effort to “code against” SEU

    • Use EDDI: error detection by duplicating instructions

      • But who wants to pay the overhead?

    • Selective use of fault (SEU) resilient development techniques

      • Pure software or hybrid (minimal HW support + SW)

      • Protection only where needed in code

  • Rigorous methodology and tools needed!


Conclusion

Conclusion

  • Off-the-shelve processors everywhere

    • Airbus 380, Boeing 787

    • Automotive industry (waking up!)

  • Lack of predictability and protection

  • New methods for timing analysis

    • Increasing complexity gap

    • Promote hybrid HW/SW solution

      • Timings on actual processor in special execution mode

      • Steer execution through SW  realistic! (ARM)

  • New methods for soft error protection

    • Either pure software or hybrid (min. HW + SW)

    • Fault (SEU) resilient software development, selective

  • Missing in action: methods and tools needed today / yesterday !!!


  • Login