Complex upset mitigation applied to a re configurable embedded processor
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Complex Upset Mitigation Applied to a Re-Configurable Embedded Processor PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on
  • Presentation posted in: General

Complex Upset Mitigation Applied to a Re-Configurable Embedded Processor. EEL 6935 Lu Hao Wenqian Wu. Outline. Issues of SRAM-based FPGA used for space application Upset mitigation solutions Resource usage and performance analysis Summary. System on Programmable Chip.

Download Presentation

Complex Upset Mitigation Applied to a Re-Configurable Embedded Processor

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Complex upset mitigation applied to a re configurable embedded processor

Complex Upset Mitigation Applied to a Re-Configurable Embedded Processor

EEL 6935

Lu Hao

Wenqian Wu


Outline

Outline

  • Issues of SRAM-based FPGA used for space application

  • Upset mitigation solutions

  • Resource usage and performance analysis

  • Summary


System on programmable chip

System on Programmable Chip

  • Soft-core processor implemented in SRAM based FPGA is very attractive to spacecraft designer. A complete computer system can be created on a single FPGA chip.


Microblaze core

MicroBlaze core

  • MicroBlaze is a soft processor core designed for Xilinx FPGAs.

  • Many aspects of the MicroBlaze can be user configured: cache size, pipeline depth (3-stage or 5-stage), embedded peripherals, memory management unit, and bus-interfaces.

Onchip peripheral bus

Local memory bus


Space application issues

Space application issues

  • Radiation environment

    In space, high energy ionizing particles exist as part of the natural background.

    In addition, solar particle events and high energy protons trapped in the Earth's magnetosphere (Van Allen radiation belts).

    These electro-magnetic radiation brings potential threats to electronic devices.

  • Single Event Upset (SEU)

    SEU is a change of state caused by ions or electro-magnetic radiation striking a sensitive node in a micro-electronic device, such as in a microprocessor, semiconductor memory, or power transistors. The state change is a result of the free charge created by ionization in or close to an important node of a logic element (e.g. memory "bit").

  • FPGA is susceptible to SEU

    data/instruction stored in block memory

    configuration bits stored in distributed RAM

  • Upsets mitigation technique is one of key issues for SRAM-based FPGA design for space application


Proposed upset mitigation

Proposed upset mitigation

  • To ensure reliable space application based on SRAM-FPGA, the author investigates 3 level of upset mitigation:

    • Functional-block design triplication

    • Continuous external configuration scrubbing

    • Independent internal BRAM scrubbing (also triplicated)


Tool device and environment

Tool, device and environment

  • Tools:

    Xilinx TMR: easily trade off maximum radiation effect immunity against area, pinout, and board layout consideration.

  • Device:

    Xilinx Virtex II XQR2 V6000 FPGA

  • Program running in MicroBlaze:

    Integer-based FFT

  • Test environment:

    Crocker Nuclear Laboratory at University of California at Davis using a proton beam of 63.3 MeV.

  • Test borad

    Two FPGAs, one is device under test (DUT), the other is service FPGA


Dut and service fpga

DUT and Service FPGA

  • Service FPGA performs two functions:

    1) configuration readback and scrubbing DUT when there is readback error

    2) control and monitoring of the functional operation of the MicroBlaze running the FFT program

  • Program (FFT) is stored in internal BRAM each time the DUT is configured

  • Data is sent to DUT internal BRAM by service FPGA.

  • The result of FFT program are returned to service FPGA and compared to the expected result.

Service FPGA

DUT

uBlaze

BRAM


Upset mitigation

Upset Mitigation

  • Mitigation solution

    • Functional-block design triplication

    • Continuous external configuration scrubbing

    • Independent internal BRAM scrubbing (also triplicated)


Complex upset mitigation applied to a re configurable embedded processor

TMR

  • Triple Module Redundancy

    3 modules performing the same task, only the majority will be pick up as output by the Voter.

    If any one of the three systems fails, the other two systems can correct and mask the fault. If the voter fails then the complete system will fail. However, in a good TMR system the voter is a critical component and should be much more reliable than the other components.

TMR


Xilinx tmr

Xilinx TMR


Upset mitigation1

Upset mitigation

  • Mitigation solution

    • Functional-block design triplication

    • Continuous external configuration scrubbing

    • Independent internal BRAM scrubbing (also triplicated)


External configuration scrubbing

External Configuration Scrubbing

  • Configuration scrubbing is the process of rewriting the configuration memory of an FPGA for the purpose of correcting any errors that may have accumulated since the device was last configured.

  • Service FPGA will detect readback error, and scrub the configuration by reloading bitstream to correct upsets.

  • Transparent process

    normal device operation runs concurrently and without interruption

  • Configuration scrubbing frequency: 16 MHz, i.e. 4 scrub-cycles per sec


Upset mitigation2

Upset mitigation

  • Mitigation solution

    • continuous external configuration scrubbing

    • functional-block design triplication

    • Independent internal BRAM scrubbing (also triplicated)


Independent internal bram scrubbing

Independent internal BRAM scrubbing


Bram triplication

BRAM Triplication

Port A: used for MicroBlaze processor

Port B: counter connected; used for error detection and correction


Bram triplication1

BRAM Triplication

  • TMR counter

    • Allow continuous refreshing of the BRAM contents

    • Cycle through the memory addresses incrementing the BRAM address of the second port

    • In case the first port of the BRAM is not being used, it rewrites the BRAM content at this specific address with the voted value from the associated voter (TRV16).

  • BRAM

    • Conventional BRAM

  • Associated voter (TRV 16)

    • Compares three values from the same address of three BRAMs, selects the majority and writes back to the corresponding address.


Testing

Testing

  • Two mitigated versions of the MicroBlaze design architecture have been implemented and tested:

    • with the BRAM scrubber.

    • without the BRAM scrubber.

  • Error types:

    • Type 1 errors: FFT outputs were wrong.

      • Type 1a: Corrected after a configuration scrub cycle

      • Type 1b: Not corrected after a scrub cycle, even after a reset of the DUT design

    • Type 2 errors: Nonresponsiveness of the DUT, requiring a reset and synchronization

      • Type 2a: Corrected by scrubbing and hence referred to as a recovering reset

      • Type 2b: Not corrected by scrubbing and referred to as a runaway reset.

        • This type of error (runaway reset) is an uncorrected error condition that causes the functional monitor to continually attempt to reset the MicroBlaze processor each time the watchdog timer set for the handshaking between the two FPGAs reaches its limit value.

    • Type 3 errors: Occurrence of an exception or interrupt detection.

This is what we emphasis on


Complex upset mitigation applied to a re configurable embedded processor

(No BRAM scrubber)

Is BRAM code corruption the main reason of runaway resets?

(BRAM scrubber)


Standalone test

Standalone test

  • To make sure that the BRAM code corruption is likely to be the cause of these runaway resets, the BRAM mitigation design has been implemented in standalone mode and tested under proton beams at similar fluxes and at the same facility.


Runaway resets caused by bram corruption

Runaway Resets Caused by BRAM Corruption

  • At a flux (1.70×108), at least 17% (1.21×10-11/6.82×10-11) of the runaway resets are due to errors in the BRAM code, while at a (1.70×109) flux, 23% of them are caused by code corruption.


Exceptions caused by bram runaway resets

Exceptions Caused by BRAM Runaway Resets

  • Design 1: An average of 64% of the unrecovered resets (due to BRAM code corruption) has been detected by exceptions (64% at the flux 1 and 80% at the flux 2).

  • Design 2: exceptions were observed only after an increase of two orders of magnitude of the flux (1.70×109) and only 25% of the runaway resets have been detected.

  • Not all the illegal states are detected by the exception mechanism.

    • At a lower flux (1.70×108) , although seven resets have been observed, no exceptions have been detected

  • The MicroBlaze was optimized to fit in the Xilinx FPGAs and the exception circuitry has been designed to detect only major illegal operations.


Conclusion

Conclusion

  • Issues of SRAM-based FPGA used for space application

    • Single Event Upset (SEU) can be caused by radiation environment

    • So we need fault tolerance system

  • Complete solution of upset mitigation implemented on Xilinx Virtex II FPGA

    • continuous external configuration scrubbing

    • functional-block design triplication

    • Independent internal BRAM scrubbing (also triplicated)

  • Testing results

    • BRAM code corruption is the main reason causing runaway resets


Reference

Reference

  • [1] F. Lima, C. Carmichael, J. Fabula, R. Padovani, and R. Reis, “A fault injection analysis of virtex FPGA TMR design methodology,” presented at the Radiation and Its Effects on Components and Systems, Sep. 2001.

  • [2] F. Lima(de), S. Rezgui, E. F. Cota, M. Lubaszewski, and R. Velazco, “Designing and testing a radiation hardened 8051-like micro-controller,” presented at the Military and Aerospace of Programmable Devices and Technologies Conf., Laurel, MD, Sep. 2000.

  • [3] G. Swift et al., “Dynamic testing of xilinx virtex-II field programmable gate array’s (FPGA’s) Input Output Blocks (IOB’s),” IEEE Trans. Nucl. Sci., vol. 51, no. 6, pp. 3469–3474, Dec. 2004.

  • [4] C. Carmichael, B. Bridgford, and J. Moore, “Triple module redundancy scheme for static latch-based FPGAs,” presented at the Military and Aerospace of Programmable Devices and Technologies Conf., Laurel, MD, Sep. 2004.

  • [5] Triple Module Redundancy Design Techniques for Virtex FPGAs, Xilinx Appl. Note XAPP197, C. Carmichael. (2001, Nov.). [Online]. Available: http://www.xilinx.com/bvdocs/appnotes/xapp197.pdf

  • [6] MicroBlaze Processor Reference User Guide, Xilinx, Inc., Aug. 2004. Embedded Development Kit (EDK 6.3), UG081, Version 4.0.

  • [7] FFT C Code, T. Roberts and M. Slaney. (1994, Dec.). [Online]. Available: http://www.jjj.de/fft/int_fft.c

  • [8] TMR Tool User Guide, Xilinx, Inc., UG156, Version 6.2.3 (2004, Sep.). [Online]. Available: http://support.xilinx.com/products/milaero/ug156.pdf

  • [9] Triple Module Redundancy Design Techniques for Virtex FPGAs, Nov. 2001. Xilinx Appl. Note 197.


Complex upset mitigation applied to a re configurable embedded processor

Thanks!

Questions?


  • Login