Fault-tolerant Typed Assembly Language
1 / 1

Fault-tolerant Typed Assembly Language - PowerPoint PPT Presentation

  • Uploaded on

Fault-tolerant Typed Assembly Language. Q. Frances Perry, Lester Mackey, George A. Reis, Jay Ligatti, David I. August, and David Walker Princeton University. Transient Faults. TAL FT : An Idealized Fault-tolerant System. Formalizing the TAL FT System. Performance Results.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Fault-tolerant Typed Assembly Language' - tyson

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Fault-tolerant Typed Assembly Language


Frances Perry, Lester Mackey, George A. Reis, Jay Ligatti, David I. August, and David Walker

Princeton University

Transient Faults

TALFT: An Idealized Fault-tolerant System

Formalizing the TALFT System

Performance Results

  • Operational semantics !’ specifies how a program executes.

    • is a machine state (R,C,M,Q,ir).

    • Special operational rules randomly introduce faults by arbitrarily modifying a value in the register file R or the queue Q

  • Judgment `Z states that machine state  is well-typed under zap tag Z.

  • Z is either empty or a color c, representing the color of the computation that has been corrupted.

    • If Z is empty then all standard typing invariants must hold.

    • If Z is a color c, then values colored c do not need to conform to their given types.

  • We simulated the TALFT hardware using the Velocity Research Compiler.

  • Despite essentially doubling the number of instructions, TALFT is only 34% slower than the equivalent fault-intolerant code.

  • Goal: Detect all faults that change a program’s observable behavior and prove that well-typed programs always detect a single fault.

  • General Strategy: Two redundant and independent copies of each computation (labeled green and blue) that are compared before stores and control flow transfers.

  • Transient faults (also known as soft fails) occur when an energetic particle strikes a transistor, causing it to change state.

    • Does not permanently damage hardware.

    • May corrupt computation by altering stored values and signal transfers.




1 + 1 = 18

  • In 2004, a typical laptop with 1GB RAM had 1 soft failure per year.

    • Fault rates increase with altitude.

    • Fault rates on a trans-pacific flight increase 300x to nearly 1 fault per roundtrip.

The TALFT Machine

  • A machine state consists of a code memory C, a value memory M, a register file R, a current instruction ir, and a store queue Q.


  • Sun, HP, and Cypress Semiconductor have admitted that transient faults caused crashes and problems at eBay, AOL, Los Alamos, and other major sites.

  • Faster clock rates, increasing transistor density, decreasing voltages, and smaller feature size all contribute to increasing fault rates of approximately 8% per generation.

    • Transient faults are already a significant concern, and their impact is increasing.

    • TALFT formalizes idealized hardware for a hybrid hardware-software fault-tolerance scheme.

    • TALFT’s type system captures the invariants necessary to reason about the system.

    • We formally proved that all well-typed programs are fault-tolerant relative to the fault model.




    0x0393 mov r1 4

    0x0394 mov r3 4

    0x0395 stG r2 r1

    0x0396 stB r4 r3





    The Fault-tolerance Theorem

    • A program is fault tolerant when all the faulty executions simulate fault-free executions of the program.

    • The relationship 1simc2 says that 1 and 2 are identical modulo values colored c.

    • (Simplified) Fault-Tolerance Theorem: Either a faulty execution is indistinguishable from the equivalent fault-free execution, or it terminates in the fault state.

    • If `  and  simc 1 and  ! ’

    • then either 1! 2 and ’ simc 2

    • or 1!fault.

    • A value is observed when it is stored to memory.

    • Q buffers a store from the green computation until the corresponding blue store is performed.

    • Special hardware is used by memory and control flow instructions to detect mismatches between the computations and trigger the special fault state.

    For More Information…

    • Transient faults are a well-known problem in the architecture community.

    • There are many existing solutions, but do these solutions actually work?

      • Solutions generally consist of algorithms stated in English, and it is left to the audience to judge correctness.

    • See our upcoming paper in PLDI ’07

    • http://www.cs.princeton.edu/~frances/tal_ft-pldi07.pdf

    • Visit the Project ap Homepage

    • http://www.cs.princeton.edu/sip/proj/zap

    Frances Perry. CRA-W/CDC Programming Languages Summer School. May 9-11, 2007.