slide1
Download
Skip this Video
Download Presentation
Center for Embedded Systems Research (CESR) Department of Computer Science

Loading in 2 Seconds...

play fullscreen
1 / 8

Center for Embedded Systems Research (CESR) Department of Computer Science - PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on

Missing in Action: Timing Analysis and Soft Error Protection. Frank Mueller. Center for Embedded Systems Research (CESR) Department of Computer Science North Carolina State University. Example: A380 Overheat Detection. w/ Hamilton Sundstrand/United Techn. Overall system has 54 sensors

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Center for Embedded Systems Research (CESR) Department of Computer Science' - deo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Missing in Action:

Timing Analysis and Soft Error Protection

Frank Mueller

Center for Embedded Systems Research (CESR)

Department of Computer Science

North Carolina State University

example a380 overheat detection
Example: A380 Overheat Detection
  • w/ Hamilton Sundstrand/United Techn.
  • Overall system has 54 sensors
  • When too hot, isolate air channels
    • Close valves over AFDX network
  • Avoids overheating upon leakage
    • plane’s hull is hybrid carbon/metal can burn hole into it!
  • SW has to adhere to RTCA DO-178B standard
    • Level A: conditional decision, branch/decision/stmt coverage
    • Level B: branch/decision/statement coverage
    • Level C: statement coverage
  • SW is written as cyclic executives
requirements
Requirements
  • SW standard requirements – some examples:
    • All switch statements must have a default case
    • Single entry and single exit functions only
    • Strict type checking required
  • SW certification requirements
    • Qualified tools to check for adherence to standard
    • Simulation environment for testing functionality
    • Explicit tests for every low level requirement
    • Programmer independence
    • New: Timing guarantees (required by Airbus!) worst-case execution time (WCET) analysis
missing in action 1 timing analysis
Missing in Action 1: Timing Analysis
  • WCET: Worst-case execution time
    • needed for schedulability analysis
  • WCET bounds: determined by timing analysis
    • should be safe and tight
    • derived by tools: only semi-automated, small programs
    • restrictions: loop bounds, no heap, no func pointers
    • predictable architecture
  • Problems:
    • WCET >> actual execution time  under-utilization
    • Complexity wall:
      • timing analysis tools lagging behind architectural innovation
      • not getting closer (maybe even loosing)
  • Tools and methods lag behind  What to do?
timing analysis status quo and needs
Timing Analysis: Status Quo and Needs
  • Capabilities of static timing analysis
    • In-order scalar pipeline, static branch prediction, split I/D $
  • Contemporary processors
    • Out-of-order, multiple issue, dynamic branch prediction, multi-level caches, deep speculation, etc.
  • Analyzability fundamental to design of safe systems
    • excludes contemporary microarchitectures
    • Long-term implications
  • Complexity wall  need new methods for timing analysis
  • Promote hybrid HW/SW solution
    • Timings on actual processor in special execution mode
    • Steer execution through SW  realistic! (ARM)
  • Rigorous methodology and tools needed!
another failure single event upset
Another Failure: Single Event Upset
  • Radiation from space due to solar flare can cause bit flips
    • Heavy ion strikes flip-/flop, RAM, …
    • Issue in higher atmosphere  planes over flying over poles
  • Typically sufficient to consider single (bit) event upset (SEU)
    • Multiple bits statistically too rare to care for
    • Also caused by smaller fabs  smaller noise ratios  errors
  • Protect RAM w/ ECC
  • Caches/processors unprotected
    • Unless radiation hardened  expensive
  • Examples: solar flares
    • Many failed servers in 1999
    • Nozomi Mars Probe rendered inoperable
  • IBM has built-in checks for 80% of server-chip circuits
seu on the airbus 380
SEU on the Airbus 380
  • Uses PowerPC 750CXe
    • Off-the-shelve
    • RAM has ECC
    • L2 has ECC but L1 does not
    • No protection against SEU in processor core
  • Options:
    • Do not use L1 and best effort to “code against” SEU
    • Use EDDI: error detection by duplicating instructions
      • But who wants to pay the overhead?
    • Selective use of fault (SEU) resilient development techniques
      • Pure software or hybrid (minimal HW support + SW)
      • Protection only where needed in code
  • Rigorous methodology and tools needed!
conclusion
Conclusion
  • Off-the-shelve processors everywhere
    • Airbus 380, Boeing 787
    • Automotive industry (waking up!)
  • Lack of predictability and protection
  • New methods for timing analysis
    • Increasing complexity gap
    • Promote hybrid HW/SW solution
      • Timings on actual processor in special execution mode
      • Steer execution through SW  realistic! (ARM)
  • New methods for soft error protection
    • Either pure software or hybrid (min. HW + SW)
    • Fault (SEU) resilient software development, selective
  • Missing in action: methods and tools needed today / yesterday !!!
ad