1 / 65

Nihan Özman - 2005700452

SWIFT: Software Implemented Fault Tolerance George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August Princeton University International Symposium on Code Generation and Optimization CGO’05. Nihan Özman - 2005700452. Outline. Introduction Prior Work

chavez
Download Presentation

Nihan Özman - 2005700452

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SWIFT: Software Implemented Fault ToleranceGeorge A. Reis, Jonathan Chang, Neil Vachharajani,Ram Rangan, David I. AugustPrinceton UniversityInternational Symposium on Code Generation and Optimization CGO’05 Nihan Özman - 2005700452

  2. Outline • Introduction • Prior Work • Software Fault Detection • Control Flow Checking • SWIFT • Implementation Details • Evaluation • Conclusion

  3. Introduction • In recent decades, microprocessor performance has been increasing exponentially due to: • smaller and faster transistors with low threshold voltages • tighter noise margins enabled by improved fabrication technology • While these devices yield performance enhancements, they will • be less reliable • make processors that use them more susceptible to transient faults

  4. Properties of Transient Faults • Known as soft errors • Unlike manifacturing or design faults, do not occur consistently. • Caused by external events: • such as energetic particles striking the chip • not cause permanent physical damage to the processor • alter signal transfers or stored values and thus cause incorrect program execution

  5. Hardware Solutions for Transient Faults • To counter transient faults, designers typically introduce redundant hardware: • Some storage structures, such as caches and memory, include error correcting codes (ECC) and parity bits: redundant bits can be used to detect or correct the fault. • Combinational logic within the processor can be protected by duplication:output from the duplicated combinational logic blocks can be compared to detect faults.

  6. Examples of Advanced Hardware Solutions • High-availability systems need more redundancy hardware than that provided by ECC and parity bits, like: • IBM has added additional logicwithin its mainframe processors for fault tolerance. • During design of S/390 G5, IBM fully replicated the processor’s execution units to avoid various performance pitfalls with their previous fault tolerance approach. • Fujitsu used a form of error protection that includes ALU parity generation and a mul/divide residue check. • Boeing designed its 777 aircraft system with three different processors and data busses using a majority voting scheme to achieve both fault detection and recovery.

  7. Disadvantages of Hardware Solutions • Too expensive for many processor markets, including highly price-competitive desktop and laptop markets. • May have ECC or parity in the memory subsystem, but certainly do not posses double or triple redundant execution cores. • Transient faults in both memory and combinational logic will need to be addressed in all aggressive processor designs, not only in high-availability applications.

  8. Proposed Software Solution • To achieve redundancy and fault tolerance, a software-based, single-threaded approach, SWIFT, is proposed. • It performs fault detection in a manner compatible with most reporting and recovery mechanisms (can be easily extended to incorporate complete fault tolerance) • It is a compiler-based transformation that • duplicates the instructions in a program • inserts comparison instructions at strategic points during code generation.

  9. Desirable Features of Software Solution • The technique does not require any hardware changes. • The compiler is free to make use of slack in a program’s schedule to minimize performance degradation. • Programmers are free to vary transient policy within a program. • A compiler orchestrated relationship between the duplicated instructions allows for simple methods to deal with exception-handling, interrupt-handling and shared memory

  10. Improvements of SWIFT • Requires no hardware beyond ECC in memory subsystem • Eliminates the need to double the memory requirement by acknowledging the use of ECC in caches and memory • Increases protection at no additional performance cost by introducing a new control- flow checking mechanism • Reduces performance overhead by eliminating branch validation code made unnnecessary by this enhanced control flow mechanism. • Performs better than all known single-threaded full software detection techniques. • Deployable in both uniprocessor and multiprocessor environments (methods to deal with exception-handling, interrupt-handling, shared memory programs)

  11. Implementation of SWIFT • SWIFT can be implemented on any architecture and can protect individual code segments to varying degrees. • A full program implementation running on Itanium 2 is evaluated. • In experiments, SWIFT demonstrates • exceptional fault-coverage with a reasonable performance cost • a 14% average speedup compared to the best known single-threaded approach utilizing an ECC memory system

  12. Prior Work – Hardware-Based Redundancy • Mahmood and McCluskey proposed using a watchdog processor to compare and validate the outputs against the main running processor. • Austin proposed DIVA, uses a main, high-performance, out-of-order processor core that executes instructions and a second, simpler core to validates the execution. • Compaq NonStop Himalaya, real system implementation that replicates part or all of the processor and uses checkers to validate the redundant computations. • Rotenberg expanded the SMT (Simultaneous MultiThreading) redundancy concept with AR-SMT (Active Stream/Redundant Stream Simultaneous Multithreading).

  13. Prior Work – Hardware-Based Redundancy • Reinhardt and Mukherjee proposed simultaneous Redundant MultiThreading (RMT) which increases the performance of AR-SMT and compares redundant streams before data is stored in the memory. • Mukherjee proposed a Chip-level Redundantly Threaded multiprocessor (CRT) • Gomma expanded CRT approach with CRTR to enable recovery. • Ray proposed modifying an out-of-order super scalar processor’s microarchitectural components to implement redundancy. • All HW-based approaches require the addition of new hardware logic to meet redundancy requirements.

  14. Comparison of Various Redundancy Approaches

  15. Prior Work – Software-Based Redundancy • Software-only approaches to redundancy come free of cost • Oh and McCluskey proposed a novel software redundancy approach (EDDI: Error Detection by Duplicating Instructions) wherein all instructions are duplicated and appropriate “check” instructions are inserted to validate • Oh et al. developed a pure Software Control-Flow Checking Scheme (CFCSS) wherein each control transfer generates a run-time signature that is validated by eror checking code generated by the compiler for every block • Venkatasubramanian et al. proposed Assertions for Control Flow Checking (ACFC) that assigns an execution parity to each basic block and detect faults based on parity errors.

  16. Prior Work – Software-Based Redundancy • A sphere of replication (SoR) is the logical domain of redundant execution. • SWIFT • makes several key refinements to EDDI • incorporates a software only signature-based control-flow checking scheme to achieve exceptional fault-coverage • The main difference between EDDI and SWIFT is • EDDI’s SoR includes entire processor core and the memory subsystem • SWIFT moves memory out of the SoR (memory structures are already well-protected by hardware schemes like parity and ECC, with or without scrubbing)

  17. Comparison of Various Redundancy Approaches

  18. Software Fault Detection • In this section, the following will be explained: • foundation of SWIFT • extending EDDI with control-flow checking with software signatures • introducing novel extensions that comprise SWIFT • The assumptions should be taken into consideration: • a Single-Event Upset (SEU) fault model, in which exactly one bit is flipped throughout the entire program. • memory subsystem, including processor caches, are already adequately protected using techniques like parity and ECC • the transformations are used to detect faults (efficacious and cost-effective fault detection is of primary concern)

  19. EDDI • Software-only fault detection system • Operates by duplicating program instructions and using this redundant execution to achieve fault tolerance. • Program instructions: • duplicated by the compiler • intertwined with the original program instructions • Each copy of the program uses different registers and different memory location for not to interfere with another. • Check instructions are inserted at certain synchronization points by the compiler: • the original instructions and their redundant copies agree on the computed values.

  20. EDDI • Program correctness is defined by the output of a program • Assuming memory-mapped I/O, a program has executed correctly if all stores in the program have executed correctly. • Two types of instructions should be used as synchronization points for comparing redundant values: • Store instructions • Branch instructions (misdirected branches can cause stores to be skipped, incorrect stores to be executed, or incorrect values to ultimately feed a store)

  21. EDDI Fault Detection 1: The load from a global constant address is duplicated 2: Add instruction is duplicated (to create redundant chain of computation) 3 & 4: The store’s operands are compared to their redundant copies. 5: If any difference is detected, an error is reported 6: If no difference is detected, storing values are executed to non-conflicting addresses.

  22. EDDI Fault Detection • An optimizing compiler (or dynamic hardware scheduler) is free to schedule the instructions to use additional available ILP (minimizing the performance penalty of the transformaiton). • Two different types of redundancy is exploited: • Temporal Redundancy: • The redundant duplicates are executed sequentially • Computes the same data value at two different times, usually on the same hardware • Spatial Redundancy: • The redundant duplicates are executed in paralel • Computes the same data value in two different pieces of hardware, usually at the same time

  23. Eliminating the Memory Penalty • EDDI is able to effectively detect transient faults at the cost of significant memory overhead. • Each memory location needs a shadow memory location for use with redundant duplicate. This duplication incurs: • a significant hardware cost • a significant performance cost (since cache sizes are effectively halved and additional memory traffic is created) • In the paper, it is proposed to eliminate the use of two distinct memory locations for all memory values eliminating duplicate store instructions. (Load instructions necessary) • Modifications will not reduce the fault detection coverage of the system, but will make the protected code execute more efficiently and require less memory.

  24. Eliminating the Memory Penalty • EDDI with eliminated memory penalty can be referred as: EDDI + ECC

  25. Control Flow Checking • EDDI also suffers from incomplete protection for control flow faults. • A program’s control flow can get errorneously misdirected without detection. The corruption can happen • during the execution of the branch • during register corruption after branch check instructions • due to a fault in the instruction pointer update logic • To make EDDI more robust, additional checks can be inserted to ensure control flow is being transfered properly

  26. Control Flow Checking • EDDI + ECC with control flow validation can be referred as: EDDI + ECC + CF

  27. Control Flow Checking • Each block will be assigned a signature in order to verify that control transfer is in the appropriate control block. • GSR (General Signature Register), a designated general purpose register, will hold the signatures and will be used to detect faults. • The procedure will go on in the following manner: • GSR will contain the signature for currently executing block • Upon entry to any block, GSR will be xor’ed with a statistically determined constant to transform the previous block’s signature into the current block’s signature • After transformation, GSR can be compared to the statistically assigned signature for the block to ensure that a legal control transfer is occured

  28. Control Flow Checking • Using statistically-determined constant forces two blocks which both jump to a common block (a control flow merge) to share the same signature • undesirable, since faults which transfer control to or from blocks that share the same signature will go undetected. • Run-time adjusting signature can be used instead • is assigned to another designated register • at entry of a block, this signature, GSR and predetermined constant are all xor’ed together to form new GSR • It can be different depending on the source of control transfer, so it can be used to compensate for differences in signatures between source blocks

  29. Control Flow Checking 1 & 2:Redundant duplicates for add and compare instructions 3 to 7:Compare the predicate p11to its redundant duplicate p21 and branch to error code if a fault is detected 8:Transforms the GSR from the previous block to the signature for this block (The control flow additions begin) 9 & 10:Ensure that signature is correct (otherwise error code is invoked) 11 to 13:Handles the synchronization point induced by the later store instruction

  30. Control Flow Checking • The transformation • detect any fault that causes a control transfer between two blocks that should not jump to one another (which yields incorrect signatures even if the errorneous transfer jumps to the middle of a basic block) • ensures only that the control flow is diverted to the taken or untaken path • does not ensure that the correct direction of the conditional branch is taken • The base EDDI transformation • provides reasonable guarantees (the branches input operands are verified prior to its execution) • does not detect faults that occur during the execution of a branch instruction which influence branch direction

  31. Enhanced Control Flow Checking • To extend fault detection coverage to cases where branch instruction execution is concerned, an enhanced control flow checking transformation is proposed: • EDDI + ECC + CFE • similar for blocks using run-time adjusting signatures • increases the reliability of the control flow checking • Enhanced mechanism uses a dynamic equivalent of a run-time adjusting signature for all blocks, even those that are not control flow merges • Each block asserts its target using this signature and each target confirms the transfer by checking GSR. • This signature combined with the GSR serve as a redundant duplicate for the program counter.

  32. Enhanced Control Flow Checking 1 & 2:Redundant duplicates for add and compare instructions The synchronization check before the branch instruction omitted 3:Computes the run-time signature for the target of branch by xor’ing the signature of the current block, with signature of target block Branch is predicted, so the assignment to RTS is predicted using redundant duplicate for the predicate register 4:Equivalent of 3 for the fall through control transfer

  33. Enhanced Control Flow Checking 5:Xors RTS with the GSR to compute the signature of the new block, at the target of a control transfer 6:Compares the signature in 5 with the statistically assigned signature 7:Error code is invoked if there is a mismatch in 6 8 & 9:Implement the synchronization checks for the store instruction 10:Error code is invoked if there is a mismatch in 8 or 9

  34. Enhanced Control Flow Checking • Even if a branch is incorrectly executed, the fault will be detected since RTS register will have the incorrect value: • more robustly protects against against transient faults • The EDDI + ECC + CF control flow checking • ensures that execution is transfered to a valid control block • does not ensure that correct conditional control path is taken • The enhanced control flow checking detects this case by: • Dynamically updating the target signature based on the redundant conditional instructions (3) • Checking at the beginning of each control block (5, 6, 7)

  35. SWIFT • The following optimizations applied to the EDDI + ECC + CFE transformation comprise SWIFT: • Control flow checking at blocks with stores • Redundancy in branch/control flow checking

  36. Control Flow Checking at Blocks with Stores • It is only the store instructions that ultimately send data out of the SoR: • should execute only if they “meant to” • should write the correct data to the correct address • This observation can be used to restrict enhanced control flow checking only to blocks which have stores in them. • the updates to GSR and RTS are performed in all blocks • signature comparisons are restricted to blocks with stores (any deviation from valid control flow path to that point will be detected before memory and output is corrupted) • signature check instructions are removed (SCFOpti) • By this optimization, performance is increased and static size is reduced for no reduction in reliability.

  37. Redundancy in Branch/Control Flow Checking • Branch Checking: branches are taken in proper direction • Enhanced Control Flow Checking: all control transfers are made to the proper address • Verifying all control flow subsumes the notion of branching in the right direction • By removing branch checking (BROpti) • reduction in performance and static size overhead • no reduction in reliability

  38. Undetected Errors – Points of Failure • Redundancy is introduced solely via software instructions • a delay between validation and use of the validated register values • any strikes during this gap might corrupt state • bit flips in store address or data registers are uncaught • incorrect store values or address -> incorrect writes going outside the SoR -> Incorrect Program Execution • When an instruction opcode is changed to a store instruction by a transient fault: • The compiler did not see instruction: Stores are unprotected • The store will be free to execute and its value will leave SoR

  39. Multibit Errors • Code transformations are less effective at detecting multibit faults, which can cause problems in: • when the same bit is flipped in both the original and redundant computation (Case 1) • when a bit is flipped in either the original or redundant computation and the comparison is also flipped such that it does not branch the code (Case 2) • These patterns of multibit errors are unlikely enough to be safely ignored • A dual-upset fault model, wherein two faults are injected into each program with a uniformly random distribution, is used in probability calculating

  40. Probability of Multibit Errors – Case 1 • “The same bit is flipped in both the original and redundant computation” • Assumption: • The same fault must occur in the same bit of the same instruction for the fault to go undetected • Probability of Error: • Probability of that particular instruction being chosen (average SPEC benchmark has on the order of 10^9 to 10^11 dynamic instr.) times • Probability of a particular bit being chosen (64-bit registers)

  41. Probability of Multibit Errors – Case 2 • “a bit is flipped in either the original or redundant computation and the comparison is also flipped such that it does not branch the code” • Assumption: • There is only one comparison for every possible fault • Probability of Error: • P(errorcomparison|errororiginal) = 1 / #instructions • This is a gross overestimatebecause in reality, there may be many checks on a faulty value.

  42. Implementation Details • Details specific to the implementation and deployment of SWIFT: • Different options for calling convention • Implementations on multiprocessor systems • The effects of using an ISA with prediction (IA64)

  43. Function Calls • Function calls are made as synchronization points: • Before function call, all input operands are checked against their redundant copies • if mismatch, fault is detected • o/w th original versions are passed as parameters to the function • At the beginning of function, parameters must be duplicated • Redundant and original versions • Only one version of return (must be duplicated into redundant versions for the remaining redundant code of function) • Adds performance overhead and introduces vulnerability: • Faults that occur on the parameters after the checks by caller and before the duplication by the callee will not be caught

  44. Function Calls • The calling convention should be altered • to pass multiple sets of computed arguments to a function • to return multiple return values from a function • Arguments passed in the registers need to be duplicated, not the ones in memory (memory is outside the SoR) • Multiple return values require that an extra register be reserved for the replicated return value: • additional pressure of twice as many input and output registers • fault detection is preserved accross function calls

  45. Shared Memory, Interrupts, and Exceptions • When multiple processes communicate with each other using shared memory, the compiler can not possibly enforce an ordering of reads and writes across processes. • There is always the possibilty of intervening writes from other processes and two loads of a duplicated pair of loads are not guaranteed to return the same value • not reduce the fault-coverage of the system in any way • increase the detected fault count by contributing to the number of detected faults that would not caused a failure • Similar when an interrupt or exception occurs between the two loads of a duplicated pair and the interrupt or exception handler changes the contents at the load address

  46. Shared Memory, Interrupts, and Exceptions • Hardware Solutions: • “Safe” hardware-based load value duplication techniques (the Active Load Address Buffer (ALAB) or the Load Value Queue in RMT machines) adapted to a SWIFT system (costly) • No Duplication for Loads: • Compiler does only one load (instead of two) and duplicates the loaded value for original & redundant version consumers • Removes redundancy from the load execution

  47. Shared Memory, Interrupts, and Exceptions • Dealing with Potentially-Excepting Instructions: • Compiler knows a priori, certain instructions may cause faults, and enforces a schedule in which pairs of loads are not split across them • Prevents most exceptions to be raised between two verisons of a load instruction • Redundancy in load execution • Asynchronous signals and interrupts can not be handled • Hardware solution • Single-load solution

  48. Logical Masking from Predication • Consider the branch br r1 != r2: • in the absence of a fault, if a branch were to be taken, even after a strike to either r1 or r2, condition would still hold true • the error can be safely ignored • Logical masking • allows the fault detection mechanism to be less conservative in detecting errors • reduces the overall false detected unrecoverable fault count • Special checks are needed to check logical masking • Predicted architectures naturally provide logical masking • IA64 ISA: conditional branches are executed based on a predicate value, compared by prior predicate-defining instructions (no validation before them)

  49. Evaluation - Performance • A pre-release version of OpenIMPACT compiler (modified to add redundancy) targeted at Intel Itanium 2 processors running RedHat Advanced Workstation 2.1 with 4Gb • A version created for each of the • EDDI + ECC + CFE • SWIFT techniques • Versions are also created with each of the specific optimizations removed, to see the effects individually • SWIFT-SCFopti: to analyze the control-flow checking only at blocks with stores • SWIFT-BRopti: to analyze branch checking optimization

  50. Evaluation - Performance • Compilers used to evaluate techniques on benchmarks: • SPEC CINT2000, SPEC FP2000,SPEC CINT95,Media Bench • Executions compared against binaries generated by the original OpenIMPACT compiler (have no fault detection) • The fault detection code was inserted into the low level code immediately before register allocation and scheduling • Optimizations that would have interfered eith the duplicated and detection code, Common Subexpression Elimination, modified to respect the fault detecting code

More Related