1 / 50

Single Event Upset SEU Mitigating Techniques in a Space Radiation Environment for the FPGA based Iterative Repair Proces

roxy
Download Presentation

Single Event Upset SEU Mitigating Techniques in a Space Radiation Environment for the FPGA based Iterative Repair Proces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Single Event Upset (SEU) Mitigating Techniques in a Space Radiation Environment for the FPGA based Iterative Repair Processor Group Presentation (11/30/2007) Jeffrey M. Carver

    2. Outline Introduction Background Fault Tolerant Techniques Configuration Frames DMRH and Fan-out design Iterative Repair Processor Fault Protected SEU Simulator Current Results Conclusions and Program of Study Publications

    3. Outline Introduction Background Fault Tolerant Techniques Configuration Frames DMRH and Fan-out design Iterative Repair Processor Fault Protected SEU Simulator Current Results Conclusions and Program of Study Publications

    4. Space Applications FPGAs are being used in space applications because of: Low cost over ASICs Reconfigurable ability Can be optimized for a specific application Problems that occur in space Single Event Upsets (SEUs) occur when a memory cell changes values because of the radiation in the environment. Radiation also plagues combinational logic by causing a temporary glitch that has been measured lasting from .3ns to 1.3ns. For FPGAs this means that fault tolerant techniques need to be applied to protect the storage memory, configuration memory, and combinational logic on an FPGA.

    5. Research Goal To find and apply fault tolerant techniques for a system designed for space applications (Iterative Repair Processor). Once the fault techniques to apply have been identified, an SEU Simulator for testing the robustness of the technique will be developed and used. The techniques will then be applied and tested.

    6. Outline Introduction Background Fault Tolerant Techniques Configuration Frames DMRH and Fan-out design Iterative Repair Processor Fault Protected SEU Simulator Current Results Conclusions and Program of Study Publications

    7. Triple Modular Redundancy (TMR) Is triplication of the module with a voting circuit to vote on the correct output of the device. Variants of this concept are used. Analog component to use for voting circuit Using 2-3 voting circuits with tri-state buffer. TMR in time (The picture in the upper-right) This picture is an example from FPGA editor showing a tri-state buffer being used on an output port. This is an example implementing the voting circuit with tri-state buffer discussed below. (Picture in the bottom-right) This picture shows the an example using 3 voting circuits with tri-state buffers.(The picture in the upper-right) This picture is an example from FPGA editor showing a tri-state buffer being used on an output port. This is an example implementing the voting circuit with tri-state buffer discussed below. (Picture in the bottom-right) This picture shows the an example using 3 voting circuits with tri-state buffers.

    8. Hamming Codes Hamming code is to insert check bits throughout the word. Improved Hamming Code can require an extra check bit, but it appends check bits onto the end of the word. Both can correct a single error in a word. Hamming Relationship = # check bits required Hamming Codes can also be implemented so that they can Double Error Detect (DED). This means you can detect a Multiple Event Upset (MEU) in the Word, but you can't fix it.Hamming Codes can also be implemented so that they can Double Error Detect (DED). This means you can detect a Multiple Event Upset (MEU) in the Word, but you can't fix it.

    9. TMR vs. Hamming TMR Requires at least a 200 percent increase in space. It is good for small memory and state machines. Hamming Codes Good for large memories. Requires check bits, Hamming Encoder, and Hamming Decoder. Seen to increase timing delay over TMR. Based the paper that this information came from, we decided to use TMR on small memory elements and Hamming Codes on the larger memory elements. This is because Hamming Codes require resource space to implement the Hamming Encoder and Decoder which can be a large overhead on small memory elements.Based the paper that this information came from, we decided to use TMR on small memory elements and Hamming Codes on the larger memory elements. This is because Hamming Codes require resource space to implement the Hamming Encoder and Decoder which can be a large overhead on small memory elements.

    10. DWC-CED Double Redundancy with Comparison combined with Concurrent Error Detection (DWC-CED) Two modules perform the same operation and their output is compared. (savings of area) If the outputs do not match then it takes one more clock cycle to run the concurrent error detection method that finds which module is correct. Problem is finding a test that detects all possible errors that can occur in a module. We did not use this method because of the paper that shows that it can be difficult to find a CED technique to find all possible errors. There were a few they were able to find 100% error coverage on, but on others they did not show one that had 100% coverage.We did not use this method because of the paper that shows that it can be difficult to find a CED technique to find all possible errors. There were a few they were able to find 100% error coverage on, but on others they did not show one that had 100% coverage.

    11. Other Techniques Other techniques for SEUs and even Multiple Event Upsets (MEUs) in memory. Cross Parity Reed-Muller Reed Solomon Reed Solomon with Hamming Codes Problem is the resource requirement to pull off these techniques. These methods are more complex then some of the more popular ones discussed like TMR and Hamming Codes. Because they are more complex they require more resources to implement for the Encoder/Decoder. That is why we did not use these methods.These methods are more complex then some of the more popular ones discussed like TMR and Hamming Codes. Because they are more complex they require more resources to implement for the Encoder/Decoder. That is why we did not use these methods.

    12. Outline Introduction Background Fault Tolerant Techniques Configuration Frames DMRH and Fan-out design Iterative Repair Processor Fault Protected SEU Simulator Current Results Conclusions and Program of Study Publications

    13. Configuration Frames 1 bit wide Span an HCLK Row 16 CLBs in Height Size is 41 32-bit words Block Types CLBs/CLKs/DSPs/IOBs BRAM Interconnect BRAM Contents Multiple minor frames per major column A Frame Address on the Virtex-4 is comprised of the following fields: Block Type Top/Bottom Bit HCLK row Major Column Minor Column In order to understand how the frames are laid on the board, we are having a brief discussion on how a frame address is composed. So if you wanted to access the configuration frame on the upper-left part of the board you would give the following values for the fields comprising a frame address: BlockType=CLBS/CLKs/DSPs/IOBs top/bottom bit=0 HCLKrow=2 MajorFrame=0 MinorFrame=0 A Frame Address on the Virtex-4 is comprised of the following fields: Block Type Top/Bottom Bit HCLK row Major Column Minor Column In order to understand how the frames are laid on the board, we are having a brief discussion on how a frame address is composed. So if you wanted to access the configuration frame on the upper-left part of the board you would give the following values for the fields comprising a frame address: BlockType=CLBS/CLKs/DSPs/IOBs top/bottom bit=0 HCLKrow=2 MajorFrame=0 MinorFrame=0

    14. Major Frames Numbering Starts from 0 on the left and increases as going to the right SX35 Example CLBs/CLKs/DSPs/IOBs CLBs: 1-6, 8-15, 17-30, 32-39, 41-46 CLKs: 24 DSP: 7, 16, 31, 40 IOBs: 0, 23, 47 BRAM Interconnect: 0-7 BRAM Content: 0-7

    15. Minor Frames per Major Frame There are multiple minor frames per major frame. The number of minor frames depends on the type of major frame writing to. Information for total minor frames per column type is from file xhwicap_i.h. CLBs – 22 total minor frames DSPs – 21 total minor frames IOBs – 30 total minor frames CLKs – 3 total minor frames BRAM Interconnect – 20 total minor frames BRAM Content – 64 total minor frames Numbering is from 0 to totalMinorFrames-1 If you wanted to read every minor frame for the IOB in the upper-left part of the FPGA you would read with the following fields: BlockType=CLBS/CLKs/DSPs/IOBs top/bottom bit=0 HCLKrow=2 MajorFrame=0 MinorFrame=0-29 (change the minor frame number to get the next frame and continue until you have read for every possibly minor frame. If you wanted to read every minor frame for the IOB in the upper-left part of the FPGA you would read with the following fields: BlockType=CLBS/CLKs/DSPs/IOBs top/bottom bit=0 HCLKrow=2 MajorFrame=0 MinorFrame=0-29 (change the minor frame number to get the next frame and continue until you have read for every possibly minor frame.

    16. Frame Layout Size is 41 32-bit words (1312 bits total) Frames in the bottom half are mirror images in the top half with the exception of the vertical HCLK rows that contain the global and regional clocks. (ug071.pdf – Xilinx) Top Half: 1311 to 0 (word 40 to word 0) Bottom Half: 0 to 1311 (word 0 to 40) The whole point of understanding the frames is that we can lay out the circuit to be tested in frames that are not shared with the rest of the circuits on the board. That way we test only corrupting the configuration frames in circuit that we want to simulate SEUs in. Without understanding the frame we could not simulate an SEU in the configuration frames because we would not know where we are simulating an SEU at. This information is also useful to avoid corrupting the configuration frames corresponding to the simulator circuit.The whole point of understanding the frames is that we can lay out the circuit to be tested in frames that are not shared with the rest of the circuits on the board. That way we test only corrupting the configuration frames in circuit that we want to simulate SEUs in. Without understanding the frame we could not simulate an SEU in the configuration frames because we would not know where we are simulating an SEU at. This information is also useful to avoid corrupting the configuration frames corresponding to the simulator circuit.

    17. Fault Correction Techniques Techniques for repairing faults in the configuration frames of the FPGA Scrubbing – Just reload the configuration data from a device like an SEU-immune EEPROM. Error Checking and Correcting (ECC) frames Embed Hamming Codes inside the configuration frame Available in the Virtex-4 devices In order for these to be used, a device must not use resources that use the configuration frames for memory (ex. Shift Registers). Have some circuit that reads out the configuration frames and writes out the corrected frame when an error is detected. Xilinx has provided a device that automatically does this for a design discussed in xapp714.pdf. Shift registers use part of the configuration frame for memory. This means the configuration frame is constantly changing making it difficult to detect an SEU from just reading the configuration frame.Have some circuit that reads out the configuration frames and writes out the corrected frame when an error is detected. Xilinx has provided a device that automatically does this for a design discussed in xapp714.pdf. Shift registers use part of the configuration frame for memory. This means the configuration frame is constantly changing making it difficult to detect an SEU from just reading the configuration frame.

    18. Outline Introduction Background Fault Tolerant Techniques Configuration Frames DMRH and Fan-out design Iterative Repair Processor Fault Protected SEU Simulator Current Results Conclusions and Program of Study Publications

    19. DMRH Double Modular Redundancy with Hold When disagreement, send signal to ICAP Controller that will scan/fix-up errors in areas of modules. Disagreement signal also sent to controller to pause at the current iteration. If transient error, it will disappear in 1 clock cycle Best for combinational logic and parallel designs Problem is the delay of time to fix-up frame(s) This method is great to save on space requirements as compared to TMR. The problem with this method is the time required to fix-up the frames after an error is detected. The advantage of this method over DWC-CED is that it can detect and fixup 100% error that in the configuration frames. It just takes a lot longer to do it as compared to DWC-CED.This method is great to save on space requirements as compared to TMR. The problem with this method is the time required to fix-up the frames after an error is detected. The advantage of this method over DWC-CED is that it can detect and fixup 100% error that in the configuration frames. It just takes a lot longer to do it as compared to DWC-CED.

    20. Fan-out design Used in some of the multiplexers in the design. Can tolerate a SEU in the LUTs or 1 of lines after it is fanned out to the slices. The words being selected are Hamming Code protected. Reduces the need for redundancy Problem is an upset that occurs before the line is fanned out to the different slices. We are not protecting the muxes used in the design to see how much routing plays as a factor in designs. Since the words being selected in the mux are Hamming code protected, then a corruption in 1 bit in the word can be tolerated. So if a line is corrupted after the select line is fanned out to the muxes, it should be okay. The problem should occur is the line is corrupted before it is mapped out to the slices. The picture below is showing a select line that is being mapped to 2 different slices.We are not protecting the muxes used in the design to see how much routing plays as a factor in designs. Since the words being selected in the mux are Hamming code protected, then a corruption in 1 bit in the word can be tolerated. So if a line is corrupted after the select line is fanned out to the muxes, it should be okay. The problem should occur is the line is corrupted before it is mapped out to the slices. The picture below is showing a select line that is being mapped to 2 different slices.

    21. Outline Introduction Background Fault Tolerant Techniques Configuration Frames DMRH and Fan-out design Iterative Repair Processor Fault Protected SEU Simulator Current Results Conclusions and Program of Study Publications

    22. Iterative Repair (IR) Processor Design The far left picture shows an overview of the IR Processor with the proposed techniques to be applied to it. The upper-right picture shows the Simulator circuit and how it can interact with the IR Processor. Notice that MicroBlaze can communicates with the IR Processor by using the OPB Bus. The Error Detector detects if the IR Processor changed it's behavior from the last run that was done. The Memory of Best Scores holds the data from a run with no faults injected. The Continue controller is used to be able to stop of the IR Processor at different iterations.The far left picture shows an overview of the IR Processor with the proposed techniques to be applied to it. The upper-right picture shows the Simulator circuit and how it can interact with the IR Processor. Notice that MicroBlaze can communicates with the IR Processor by using the OPB Bus. The Error Detector detects if the IR Processor changed it's behavior from the last run that was done. The Memory of Best Scores holds the data from a run with no faults injected. The Continue controller is used to be able to stop of the IR Processor at different iterations.

    23. Copy Processor Notice that DMRH was applied to the combinational circuitry while TMR was applied to the control circuit.Notice that DMRH was applied to the combinational circuitry while TMR was applied to the control circuit.

    24. Alter Processor Same thing as before. Notice that DMRH was applied to the combinational circuitry while TMR was applied to the control circuit and to the small memory element. The reason the Random Number Generator shows no fault protection is who cares if it alters the random number generator slightly. The problem would be if it stopped the random number generator from being able to generate new numbers.Same thing as before. Notice that DMRH was applied to the combinational circuitry while TMR was applied to the control circuit and to the small memory element. The reason the Random Number Generator shows no fault protection is who cares if it alters the random number generator slightly. The problem would be if it stopped the random number generator from being able to generate new numbers.

    25. Evaluate Process Is comprised of three sub-processors Dependency Graph Violation Total Schedule Length Resource Over-utilization This shows an overview of the Evaluate processor which is really comprised of three sub-processors.This shows an overview of the Evaluate processor which is really comprised of three sub-processors.

    26. Dependency Graph Violation Sub-Processor Notice DMRH on combinational elements and TMR on the small memory elements. TMR is also applied on the control circuit.Notice DMRH on combinational elements and TMR on the small memory elements. TMR is also applied on the control circuit.

    27. Total Schedule Length Sub-Processor Notice DMRH on combinational elements and TMR on the small memory elements. TMR is also applied on the control circuit.Notice DMRH on combinational elements and TMR on the small memory elements. TMR is also applied on the control circuit.

    28. Resource Over-utilization Sub-Processor Originally we thought that we could detect and fixup a frame in a short period of time. Recent testing using the HWICAP on the OPB bus gave us results of it taking 18us (1800 clock cycles) to write a configuration frame and 30us (3000 clock cycles) to read/write a configuraiton frame. The max latency of the IR processor for an iteration is 235 clock cycles. So we thought of TMR this to avoid any hold in the overall design, but since we will have hold on the other stages as well, we might just apply similar techniques like we did for the other stages. We originally thought since all other processors complete before this process, we would have time to fixup errors in the other stages before this processor finished. So we were going to make sure this processor didn't hold up the iteration completion by TMR the entire thing, but the other stages could use DMRH as they had time to spare to wait for the fixup. With these new measurements the other stages will cause a holdup on the itartion completion, so DMRH poses more overhead than was originally thought.Originally we thought that we could detect and fixup a frame in a short period of time. Recent testing using the HWICAP on the OPB bus gave us results of it taking 18us (1800 clock cycles) to write a configuration frame and 30us (3000 clock cycles) to read/write a configuraiton frame. The max latency of the IR processor for an iteration is 235 clock cycles. So we thought of TMR this to avoid any hold in the overall design, but since we will have hold on the other stages as well, we might just apply similar techniques like we did for the other stages. We originally thought since all other processors complete before this process, we would have time to fixup errors in the other stages before this processor finished. So we were going to make sure this processor didn't hold up the iteration completion by TMR the entire thing, but the other stages could use DMRH as they had time to spare to wait for the fixup. With these new measurements the other stages will cause a holdup on the itartion completion, so DMRH poses more overhead than was originally thought.

    29. Accept Processor Notice DMRH on combinational elements and TMR on the small memory elements. TMR is also applied on the control circuit.Notice DMRH on combinational elements and TMR on the small memory elements. TMR is also applied on the control circuit.

    30. Adjust Temperature Processor Notice DMRH on combinational elements and TMR on the small memory elements. TMR is also applied on the control circuit.Notice DMRH on combinational elements and TMR on the small memory elements. TMR is also applied on the control circuit.

    31. Outline Introduction Background Fault Tolerant Techniques Configuration Frames DMRH and Fan-out design Iterative Repair Processor Fault Protected SEU Simulator Current Results Conclusions and Program of Study Publications

    32. BYU SEU Simulator Requires 3 Virtex 1000 FPGAs Does not directly corrupt flip-flops Corrupts bits in bitstream Advantage is that the design to test is on a seperate FPGA board.Advantage is that the design to test is on a seperate FPGA board.

    33. Xilinx SEU Simulator (xapp714) Requires 1 Virtex-4 FPGA Does not directly corrupt flip-flops Can not see what frame address and configuration bit is being corrupted. (Is stated to start from first bit in configuration memory) Clunky interface to use for simulating SEUs Uses embedded ECC frames Corrupts every configuration frame on the board. Unknown how/if it actually corrupts BRAM Interconnect and Content frames. The xapp714 device was designed for performing autonomous correction and detection of SEUs in the configuration frames. The added feature for error injection on the device was not a priority in development and thus was lacking features we wanted to use for simulating SEUs.The xapp714 device was designed for performing autonomous correction and detection of SEUs in the configuration frames. The added feature for error injection on the device was not a priority in development and thus was lacking features we wanted to use for simulating SEUs.

    34. USU SEU Simulator (Tool Flow) There are a few steps that have to be done by the user before the simulator will work correctly. The user has to specify what output to observe for changes. The user has to provide in the design some way to pause the design. By pause this means that the clock is running and FFs are not changing. Without the pause feature the only test that can be run for the Flip flops is the Stuck-At Tests. The user has to specify in the code what frames to corrupt, but this should only take a minute. The problem is that this relies on the user to understand how frames are laid out on the FPGA. Without specifying what frames to corrupt the only test that can be run is the test that corrupts the specific elements on the FPGA (LUTs, SRINV mux, FFs). Currently the user specifies how long in clock cycles it takes the design to run. This can be changed to be automated in the future. The reason to specify how long it runs is to have a timeout implemented if the design will never finish. The user has to specify what major frames correspond to DSPs, IOBs, and GCLKs for the board that the tests are running on. This has to be done in the simulator and output imaging code.There are a few steps that have to be done by the user before the simulator will work correctly. The user has to specify what output to observe for changes. The user has to provide in the design some way to pause the design. By pause this means that the clock is running and FFs are not changing. Without the pause feature the only test that can be run for the Flip flops is the Stuck-At Tests. The user has to specify in the code what frames to corrupt, but this should only take a minute. The problem is that this relies on the user to understand how frames are laid out on the FPGA. Without specifying what frames to corrupt the only test that can be run is the test that corrupts the specific elements on the FPGA (LUTs, SRINV mux, FFs). Currently the user specifies how long in clock cycles it takes the design to run. This can be changed to be automated in the future. The reason to specify how long it runs is to have a timeout implemented if the design will never finish. The user has to specify what major frames correspond to DSPs, IOBs, and GCLKs for the board that the tests are running on. This has to be done in the simulator and output imaging code.

    35. USU SEU Simulator Uses 1 FPGA (Tester circuit and design to test on same circuit) Corrupts all bits in configuration frames in the design to test area. Tests corrupting FFs 3 Techniques GCAPTURE/GRESTORE Intermediate Corruption Stuck-At Tests The Design to Test does not share configuration frames with the simulator circuit. This is so we can simulate corrupting frames only in the Design to Test are and not in the Simulator Circuit. We test going sequentially through each bit in the configuration frame and test changing it to the opposite value. If a change in behavior is observed in the IR Processor we mark this configuration bit as sensitive.The Design to Test does not share configuration frames with the simulator circuit. This is so we can simulate corrupting frames only in the Design to Test are and not in the Simulator Circuit. We test going sequentially through each bit in the configuration frame and test changing it to the opposite value. If a change in behavior is observed in the IR Processor we mark this configuration bit as sensitive.

    36. Flip-Flop Architecture FFs share all lines except D (Data) input, and XQ/YQ output SRINV mux controls reset line given to FFs SRMODE configuration bit determines what FF is set to on reset. INIT bit is value of FF when bitstream first loaded onto FPGA If radiation causes the SRINV mux to select the other input, this will cause a reset to be sent to both flip flops (assuming they were not currently being reset). This poses a problem as it can cause an upset in both flip flops in a design resulting in a Multiple Event Upset in the overall design.If radiation causes the SRINV mux to select the other input, this will cause a reset to be sent to both flip flops (assuming they were not currently being reset). This poses a problem as it can cause an upset in both flip flops in a design resulting in a Multiple Event Upset in the overall design.

    37. GCAPTURE/GRESTORE Method GCAPTURE – loads the INIT bits of all FFs and Input/Output Buffer (IOB) registers with the current value of the register GRESTORE – sets all registers to their INIT bit values. Put device into a paused state (where FFs are not changing, SR input to FFs low, and clock signal still active). Then do a GCAPTURE, change INIT bit in desired FF. Follow with GRESTORE. GCAPTURE command can be issued by writing a sequence of instruction through the ICAP port or by instantiating the CAPTURE primitive in a design and setting high the input to the CAPTURE primitive. GCAPTURE is the way we can get access to the current value in the FF. This enables us to simulate an upset in the flip flop. Problem with using a GRESTORE in our simulator is that it will restore the FFs in the simulator circuit as well. This means we will restore back to a state that was not intended in the simulator circuit.GCAPTURE command can be issued by writing a sequence of instruction through the ICAP port or by instantiating the CAPTURE primitive in a design and setting high the input to the CAPTURE primitive. GCAPTURE is the way we can get access to the current value in the FF. This enables us to simulate an upset in the flip flop. Problem with using a GRESTORE in our simulator is that it will restore the FFs in the simulator circuit as well. This means we will restore back to a state that was not intended in the simulator circuit.

    38. Intermediate Corruption Method Put device into a paused state. Issue a GCAPTURE command Based on the INIT bits, set the SRMODE of the 2 FFs in the slice. Set the FF to change to set on reset to the opposite value it is at. Set the other FF to reset to it’s current value Change the SRINV multiplexer to select the other value. (This causes reset of FFs) Fix-up the SRINV multiplexer, SRMODE bits. Device can then be resumed. This method works by changing the value in the FF through reseting the FF. We first get the current value of the FF. We change the SRMODE configuration bits of the FF to have the FF reset to the desired value when a reset line occurs. We cause a reset to occur by changing the SRINV mux to select the other input. Before continuing the device we undo the changes we did so that we are only simulating an SEU occuring on that iteration. If we did not fixup the changes it would keep simulating an SEU occuring in the FFs. The problem with this method is getting the device into a paused state for any clock cycle.This method works by changing the value in the FF through reseting the FF. We first get the current value of the FF. We change the SRMODE configuration bits of the FF to have the FF reset to the desired value when a reset line occurs. We cause a reset to occur by changing the SRINV mux to select the other input. Before continuing the device we undo the changes we did so that we are only simulating an SEU occuring on that iteration. If we did not fixup the changes it would keep simulating an SEU occuring in the FFs. The problem with this method is getting the device into a paused state for any clock cycle.

    39. Stuck-At Method Device can be in a paused state. In this method FFs are configured to be stuck at a desired value during operation of device. Configure SRMODE bits to the desired value to be stuck at. Possible combos {00, 01, 10, 11} Change SRINV mux to select opposite line. After device run, fix-up changes done. Best if device never resets FFs during operation. Helps reveal SEU sensitivity of specific FFs on any clock cycles.

    40. Outline Introduction Background Fault Tolerant Techniques Configuration Frames DMRH and Fan-out design Iterative Repair Processor Fault Protected SEU Simulator Current Results Conclusions and Program of Study Publications

    41. Designed Mapped from PlanAhead This is showing part of the IR Processor that was mapped to the Virtex 4 board. The image in the bottom-right helps describe what resources Plan Ahead is showing being mapped. Note that if a routing lined is mapped through a LUT that Plan Ahead does not always show it as being mapped.This is showing part of the IR Processor that was mapped to the Virtex 4 board. The image in the bottom-right helps describe what resources Plan Ahead is showing being mapped. Note that if a routing lined is mapped through a LUT that Plan Ahead does not always show it as being mapped.

    42. Bit Markup of Sensitive Resources This is the output image format the BYU did. It shows the sensitive areas of the design based on position in the configuration frame. So you can get a general idea of sensitive areas of the FPGA, but not exact information.This is the output image format the BYU did. It shows the sensitive areas of the design based on position in the configuration frame. So you can get a general idea of sensitive areas of the FPGA, but not exact information.

    43. Map of Sensitive Resources This tests shows the results of the flip flop tests that we ran. It also shows testing some specific resources in the Slice like LUTs, SRINV mux, and configuration bits for some of the resources. This is to help give the user specific information about what resources are sensitive to a SEU. Knowing what FFs are sensitive to SEU is important because FFs are used in state machines. This output is to give exact Slices that are known to be sensitve to SEU, instead of the approximation from the bit markup technique. The problem with this is that we have to rely on the devlopers to tell us what bits correspond to what resources. That is why we could only show for those elements in the Slice, and not every element in the Slice. So the Bit Markup method is good to give a general idea of every possible configuration bit. This display format here is great to give specific details on what is sensitive and also about the FFs in the design.This tests shows the results of the flip flop tests that we ran. It also shows testing some specific resources in the Slice like LUTs, SRINV mux, and configuration bits for some of the resources. This is to help give the user specific information about what resources are sensitive to a SEU. Knowing what FFs are sensitive to SEU is important because FFs are used in state machines. This output is to give exact Slices that are known to be sensitve to SEU, instead of the approximation from the bit markup technique. The problem with this is that we have to rely on the devlopers to tell us what bits correspond to what resources. That is why we could only show for those elements in the Slice, and not every element in the Slice. So the Bit Markup method is good to give a general idea of every possible configuration bit. This display format here is great to give specific details on what is sensitive and also about the FFs in the design.

    44. CLBs Tested The 2 SEUs comes from taking 42*4.9%.The 2 SEUs comes from taking 42*4.9%.

    45. DSPs, BRAMs Tested Look at images included to see the markup showing the DSPs, BRAM interconnect, and BRAM content. Note that areas where DSPs aren't mapped in the design that there is sensitive bits. This is because there is still routing through the DSPs that is sensitive to a SEU. Another interesting thing observed is that we were able to wipe out a ROM in the design by changing one configuration bit in the BRAM interconnect. This means that a SEU in a configuration bit can have side effects to other resources in the design. Not sure why we can't write to these bits, but figured them out from testing. So if you ever wanted to do partial reconfiguration on the BRAM content do not write a '1' to these locations.Look at images included to see the markup showing the DSPs, BRAM interconnect, and BRAM content. Note that areas where DSPs aren't mapped in the design that there is sensitive bits. This is because there is still routing through the DSPs that is sensitive to a SEU. Another interesting thing observed is that we were able to wipe out a ROM in the design by changing one configuration bit in the BRAM interconnect. This means that a SEU in a configuration bit can have side effects to other resources in the design. Not sure why we can't write to these bits, but figured them out from testing. So if you ever wanted to do partial reconfiguration on the BRAM content do not write a '1' to these locations.

    46. Outline Introduction Background Fault Tolerant Techniques Configuration Frames DMRH and Fan-out design Iterative Repair Processor Fault Protected SEU Simulator Current Results Conclusions and Program of Study Publications

    47. Conclusions Simulator Tool status Simulates SEUs in CLBs, FFs, DSPs, BRAM interconnects, and BRAM content. Needs to have a method to reload entire device when a permanent change in pattern is detected. Need to test full TMR design Need to test proposed fault tolerant design Have fault techniques automatically applied when IR Processor is being generated Thesis defense in August? Ram is going to help me put it in a partial reconfiguration region. That way when a side effect is detected like in the BRAM content we will just reload the design. The reason we did not reload the entire design every time is that it would make the tests take too long to complete. Currently it takes around a day to run all the tests. So we only want to reload the entire bitstream for the circuit under test when necessary. We just want to normally fix up only the configuration frame we changed to keep the simulator running fast. From Jonathan's work, we will take the automatically generated IR Processor and have it generate the circuit with applied fault techniques proposed. The techniques will have it apply will be TMR, DMRH, and Hamming Codes. We will anaylze the graphs and see what are memory elements and what is combinational logic to know what technique to apply.Ram is going to help me put it in a partial reconfiguration region. That way when a side effect is detected like in the BRAM content we will just reload the design. The reason we did not reload the entire design every time is that it would make the tests take too long to complete. Currently it takes around a day to run all the tests. So we only want to reload the entire bitstream for the circuit under test when necessary. We just want to normally fix up only the configuration frame we changed to keep the simulator running fast. From Jonathan's work, we will take the automatically generated IR Processor and have it generate the circuit with applied fault techniques proposed. The techniques will have it apply will be TMR, DMRH, and Hamming Codes. We will anaylze the graphs and see what are memory elements and what is combinational logic to know what technique to apply.

    48. Outline Introduction Background Fault Tolerant Techniques Configuration Frames DMRH and Fan-out design Iterative Repair Processor Fault Protected SEU Simulator Current Results Conclusions and Program of Study Publications

    49. Publications Journal Articles under review IET Transactions on Computers and Digital Techniques Phillips, J., Sudarsanam, A., Kallam, R., Carver, J., and Dasu, A., “Methodology to Derive Polymorphic Soft-IP Cores for FPGAs”

    50. Publications Conference Papers under review DAC 2008 Carver, J., Phillips, J., and Dasu, A., “Improved SEU Simulator for Virtex 4 FPGAs”

    51. Publications Planned Journal Papers IEEE Design & Test of Computers or IEEE Transactions on Reliability Carver, J., Phillips, J., and Dasu, A., “SEU Mitigating Techniques for a FPGA based Iterative Repair Processor”

More Related