1 / 26

Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing. Quinn Martin Alan George. SOAP. Background FPGAs and Radiation in Space Traditional Scrubbing Methods SOAP Approach Mission Parameters Markov Models Mission Case Studies Results

saburo
Download Presentation

Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing Quinn Martin Alan George

  2. SOAP • Background • FPGAs and Radiation in Space • Traditional Scrubbing Methods • SOAP Approach • Mission Parameters • Markov Models • Mission Case Studies • Results • Conclusions

  3. FPGAs • Field-Programmable Gate Arrays (FPGAs) • Implement custom digital logic hardware with fabric of logic resources and interconnect • Lookup tables (LUTs) implement combinational logic • User flip flops (FFs) implement sequential logic • Switch and connection boxes route among resources • Many are reconfigurable • Allows update of routing and logic state • Partial reconfiguration can update partition of device • E.g., Virtex from Xilinx and Stratix from Altera

  4. Reconfigurable FPGAs in Space • Advantages • Very high performance/power ratio • Reconfigurable (fully and partially) • Adaptable to changing environments and mission requirements • Can update design after launch • Disadvantages • Relatively difficult to design/test applications • Configuration memory vulnerable to radiation • Can change application processor architecture in unpredictable way • Must repair upsets via configuration scrubbing

  5. Radiation Effects on FPGAs • Single-event Effects (SEE) • Single-event Latchup (SEL) – Causes current spike that may damage device • Single-event Upset (SEU) – Changes state of bit(s), e.g. from logic ‘0’ to ‘1’ • Can be single-bit upset (SBU) or multi-bit upset (MBU) • Single-event Functional Interrupt (SEFI) – Like SEU, but affecting critical device resource • Total Ionizing Dose • Degrades performance over time leading to eventual device failure

  6. Xilinx V-5/V-6 Configuration • Programmed via SelectMAP interface • Runtime configuration interface • Also allows readback of existing configuration • 32 bits per configuration word • Parallel bus width of 8, 16, or 32 bits • Max clock frequency 100 MHz • Configuration memory arranged in frames • Minimum unit of access to config. memory • Virtex-5 – 41 words per frame • Virtex-6 – 81 words per frame

  7. FPGA Scrubbing • FPGA Configuration Scrubbing • Quickly repairs SEUs before accumulation • Accumulation defeats redundancy strategies (e.g., TMR) • Fast repair can prevent SEUs from manifesting as errors • Can be decomposed into basic scrubbing techniques • Correction techniques repair upsets • Detection techniques discover and locate upsets

  8. FPGA Scrubbing Techniques • Correction Techniques • Golden Copy – Repairs configuration based on know “golden” copy (e.g., in rad-hard PROM) • Frame ECC – Repairs based on per-frame error syndrome code stored on-chip • Detection Techniques • Frame ECC – Detects based on per-frame SECDED Hamming code • CRC-32– Detects using device-wide CRC-32

  9. FPGA Scrubbing Strategies • Scrubbing Strategies • Any combination of detection and correction techniques with controller to implement algorithm • Blind Scrubbing – Golden copy correction only • Readback Scrubbing – Some detection technique used

  10. FPGA Scrubbing Strategies

  11. SOAP Approach • Scrubbing Optimization via Availability Prediction (SOAP) • Uses system availability as primary metric for scrubbing efficacy • Models scrubbing strategies as Markov diagrams • Vary free parameters to find optimal scrubbing system • Environmental parameters λ and α (orbits) • System parameters B and fCCLK (memory and pin constraints) • Scrubbing parameters μ and γ (device configuration capability)

  12. SOAP Approach

  13. Environmental Parameters • λ - SEU rates for devices in various orbits of interest • Calculated per-bit and per-device using CREME96 • α – Correction factors for single-bit and multi-bit upsets (SBU/MBU) • From beam tests on Virtex-5 devices

  14. System Parameters • Factors chosen by the system designer based on available memories, power budget, etc. • Affect scrubbing detection and correction rates (see equations on next slide) • B – Configuration bus width in bits • fCCLK – Configuration clock speed in Hz

  15. Scrubbing Parameters • μ – Repair rate for scrubbing technique (per second) • γ – Detection rate for scrubbing technique (per second)

  16. Markov Algorithm Models • Blind • No detection • Built-in CRC-32 • Basic detection • Frame ECC with CRC-32 • CRC acts as “safety net” for upsets undetected by Frame ECC • Frame ECC with CRC-32 and Essential Bits (EB) • Only scrubs errors that may be critical

  17. Blind Scrubbing

  18. Readback CRC-32 Scrubbing

  19. CRC-32 w/ Frame ECC Scrubbing

  20. Case Study • Applies SOAP method to hypothetical systems with realistic parameters • Devices • Xilinx Virtex-5 • Xilinx Virtex-6 • Orbits • ISS low earth orbit (LEO) • Molniya highly elliptical orbit (HEO) • 8-bit SelectMAP bus at 33 MHz • Accounts for access speed of slow rad-hard PROM

  21. Case Study • Two mission types • Non upset critical (non-UC) – System continues to run upon detection and correction of upset • Only count critical upsets as system “unavailable” • Upset critical (UC) – System requires reset upon detection of upset to ensure state integrity • Requires detection • All detected upsets render system unavailable for reset period • Will benefit from essential bits mask used in detection

  22. Non-UC Results • Continuous blind scrubbing offers highest availability • CRC-32 offers similar availability with low implementation complexity • Frame ECC suffers because TBUs can be falsely corrected, resulting in further errors

  23. UC Results

  24. UC Results

  25. Results • Frame ECC with CRC-32 and Essential Bits mask offers highest availability • Roughly one extra nine over other methods • Xilinx-provided soft-error mitigation (SEM) core implements similar strategy • Other strategies still competitive • Complex state machine or software and additional memory required for Frame ECC/EB • Model does not account for vulnerability associated with internal scrubbing

  26. Conclusions • Predicts availability for various FPGA scrubbing strategies on real and hypothetical platforms • Uses analytical models rather than experimentation • Markov availability modeling with parametric approach • Allows optimization of scrubbing strategy during design phase • In case study, blind scrubbing best for non-UC and Frame ECC with EB mask best for UC

More Related