1 / 32

Evolvable Hardware Techniques for Autonomous Repair of FPGAs

Evolvable Hardware Techniques for Autonomous Repair of FPGAs. 5 October 2003. Ronald F. DeMara Department of Electrical and Computer Engineering University of Central Florida Jason D. Lohn, Gregory A. Larchev Computational Sciences Division NASA Ames Research Center.

truly
Download Presentation

Evolvable Hardware Techniques for Autonomous Repair of FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolvable Hardware Techniques for Autonomous Repair of FPGAs 5 October 2003 Ronald F. DeMaraDepartment of Electrical and Computer EngineeringUniversity of Central FloridaJason D. Lohn, Gregory A. Larchev Computational Sciences DivisionNASA Ames Research Center

  2. What is Evolvable Hardware??? Intelligent Search Hardware Design Combining two fields to enable complex & dynamic electronics applications … Automated Construction: developElectronic CircuitsbyIntelligent Search Applications: supportDesign,Optimization, orFailure Recoveryphases Research Focus: configuration ofField Programmable Gate Arrays (FPGAs)usingGenetic Algorithms (GAs)with applicationsto Autonomous Repairof permanent faults Bayesian Amplifiers Simulated Annealing Filters Genetic Algorithms FPGAs Nearest Neighbor Antennas Evolvable Hardware Applications

  3. Evolvable Hardware (EHW) Biological ModelsofGenetic Representations andEvolutionary Principles Conceptual Inspiration Powerful technique for multi-objective optimization problems: • power consumption, weight, size, cost, speed, orreliability Faster design cycle:can use to optimize orrepairhuman-generated designs Excellent for difficult-to-design systems: • adaptive systems anddynamic devices in unpredictable environments CACM Science

  4. EHW in the Big Picture Intelligent Search Machine Intelligence Techniques other sub-disciplines Adaptive/Soft Computing Evolutionary Computation Fuzzy Systems Neural Networks Genetic Algorithms Cellular Automata Simulated Annealing Application Domains Numerical Optimization Mechanical Design Evolvable FPGAs

  5. GA Operational Flow of EHW Techniques FPGA Configuration CIRCUIT OUTPUT 1. Objective for EHW procedure is specified • realize a 8-bit adder circuit or program a digital chip to perform a function such as tone discrimination • Relative ranking called Fitness Function is defined 2. Population of alternative designs is created • completely atrandom or seeded with hand designed 3. Genetic Algorithm invoked to evolve each alternative • Fitness evaluated for alternatives using FPGA • FPGA contains programmable logic and interconnectresources to realize arbitrary number of circuits • Genetic Operators used to increase fitness 4. Fitness Exit Criteria checked • If max(fitness)<threshold then repeat Step 3 5. Best design represents desired hardware configuration AND OR XOR NOR Buses Muxes Pass Transistors CIRCUIT INPUT • FPGA final configuration implements the circuit PC config Example: GA running on PC platform configures a reprogrammable Static RAM based FPGA FPGA results

  6. Genetic Algorithms (GAs) Mechanism coarsely modeled after neo-Darwinism (natural selection + genetics) start replacement offspring population of candidate solutions evaluate fitness of individuals Fitness function mutation crossover selection of parents parents Goal reached

  7. Genetic Mechanisms • Guided trial-and-error search techniques using principles of Darwinian evolution • iterative selection, “survival of the fittest” • genetic operators -- mutation, crossover, … • implementor must define fitness function • GAs frequently use strings of 1s and 0s to represent candidate solutions • if 100101 is better than 010001 it will have more chance to breed and influence future population • GAs “cast a net” over entire solution space to find regions of high fitness • Can invokeElitism Operator(E=1, E=2 …) • guarantees monotonically increasing fitness of best individual over all generations

  8. GA Success Stories Commercial Applications: • Nextel: frequency allocation for cellular phone networks -- $15M predicted savings in NY market • Pratt & Whitney: turbine engine design --- engineer: 8 weeks; GA: 2 days w/3x improvement • International Truck: production scheduling improved by 90% in 5 plants NASA:superior Jupiter trajectory optimization, antennas, FPGAs Koza:25 instances showing human-competitive performance such as analog circuit design, amplifiers, filters

  9. Representing Candidate Solutions • Representation of an individual can be using discrete values (binary, integer, or any other system with a discrete set of values) • Example of Binary DNA Encoding: Individual (Chromosome) GENE

  10. mutation recombination (crossover) Genetic Operators t t+1 selection reproduction

  11. . . . cut cut 1 1 1 1 1 1 1 0 0 0 0 0 0 0 parents 1 1 1 0 0 0 0 0 0 0 1 1 1 1 Crossover Operator Population: offspring

  12. 1 1 1 0 1 1 1 after a b c d a b c d z Mutation Operator Representation Biology Boolean 1 1 1 1 1 1 1 before z mutated gene

  13. Visualizing GA Operation Roadmap to animation on the next slide

  14. Visualizing GA Operation current population new population 2 parent individuals potentially undergo crossover Individual is potentially mutated

  15. EHW Environments • Evolvable Hardware (EHW)Environmentsenable experimentalmethods to researchsoft computingintelligent search techniques • EHW operates by repetitive reprogramming of real-world physical devices using aniterative refinementprocess: Extrinsic Evolution Intrinsic Evolution Application Two modes of Evolvable Hardware or Genetic Algorithm Genetic Algorithm Stardust Satellite: • >100 FPGAs onboard • hostile environment: radiation, thermal stress • How to achieve reliability to avoid mission failure??? Simulation in the loop Hardware in the loop Done? Build it software model new approach to Autonomous Repair of failed devices device “design-time” refinement device “run-time” refinement

  16. Our Goal:Autonomous FPGA Repair An alternative to redundancy for increased reliability without carrying spare hardware … Redundancy increases with amount of spare capacity restricted at design-time based on time required to select spare resource determined by adequacy of spares available (?) yes Repair independent of number of viable spares variable at recovery-time based on time required to find suitable repair affected by multiple characteristics (+ or -) yes everyday example automobile spare tire can of fix-a-flat  Overhead from Unutilized Spares weight, size, power Granularity of Fault Coverage resolution where fault handled Fault-Resolution Latency availability or downtime required to handle fault Quality of Repair likelihood and completeness Autonomous Operation fix without outside intervention     

  17. Autonomous Repair • UCF has developed an evolutionary fault-recovery system for FPGAs • Employs a genetic representation that can accommodate both logicandinterconnect failures • Experiments were run using Xilinx Virtex FPGA • Demonstrate that a complete repair of some combinational and sequential circuits is realizable • Contribution of new evolutionary procedures for repair and novel insights to fault occlusion, resource recycling, andparameter optimization new approach to Autonomous Repair of failed reprogrammable devices

  18.   Related Work Evolutionary Design Techniques for FPGA Fault-Tolerance Evolve redundancyinto designbeforetheanticipatedfailure occurs Messy Gate Approach[Miller 2001] • logic functions contain redundant terms as functional boundaries change and overlap Fault-tolerant Oscillator Design[Canham and Tyrrell 2002] • designs evolved under a range of faults during fitness assessment • population-based approach with fitness function corresponding to operation without faults • additional pass evaluates tolerance to a range of faults Design with Potentially Faulty Components[Thompson 1997] • evolution of designs with redundant capabilities • range of fault cases introduced • individuals able to exploit whatever component behaviors exist, even faulty ones Evolutionary Fault Recovery for FPGA Fault Handling Evolve recovery from aspecificfailure after (and if) it actually occurs Evolutionary Repair of 4x4 Multiplier[Vigander 2001] • attempts to restore functionality after random faults injected into FPGA CLBs • completely correct repair not achieved although excellent partial repairs • voting mechanism proposed using alternative partially repaired circuits 

  19. Fault-Handling Techniques for SRAM-based FPGAs Device Failure Characteristics Duration: Transient: SEU Permanent: SEL, Oxide Breakdown, Electron Migration Device Configuration Processing Datapath Device Configuration Processing Datapath Target: BIST Evolutionary Repetitive Readback Approach: TMR STARS CED Vigander UCF Methods Supplementary Testbench Duplex Output Comparison Duplex Output Comparison Detection: (not addressed) Cartesian Intersection Isolation: (not addressed) Bitwise Comparison Majority Vote unnecessary Fast Run-time Location Worst-case Clock Period Dilation Diagnosis: unnecessary unnecessary Population-based GA using Extrinsic Fitness Evaluation Evolutionary Algorithm using Intrinsic Fitness Evaluation Recovery: Replicate in Spare Resource Select Spare Resource Invert Bit Value Ignore Discrepancy

  20. Quadrature Decoder • Applications requiring determination of angular translation (or speed) • Example: DC-motor to drive system for a mobile robot we may wish to move forward (or reverse) by a fixed distance • Decoder determines rotation direction

  21. Quadrature Decoder • Finite state machine • Input and State traces • State transition table

  22. Genetic Representation • Representation = how we represent FPGA configurations in the GA • Goals: • Allow all possible LUT configurations • Allow all possible CLB interconnections given constraints of routing support • Disallow illegal FPGA configurations • Make it “easy” for crossover to combine good configurations • Minimize non-coding introns (junk DNA) • Bitstring representation is natural choice, though may not scale well (investigating generative reps) • Representation is specific to Xilinx Virtex FPGA

  23. LUT 0 LUT 0 LUT 2 LUT 2 LUT 1 LUT 1 LUT 3 LUT 3 Genetic Representation • Logic bits in the LUTs • Routing bits specify how to connect LUT outputs to LUT inputs LUT 0 LUT 2  LUT 1 LUT 3 CLB 0 CLB 1 CLB n

  24. ECJ + Our Code evaluate Virtex DS FPGA output JBits JBuilder simulated fault Experimental Setup • Software and Hardware Testbeds: • ECJ • Xilinx JBits • Xilinx Virtex DS simulator • JBuilder Java SDK • Evaluation: • Input stream of 100 bit pairs • Output stream of 110 bits sampled across 4 CLBs • Stuck-at-zero fault on CLB2 F1 slice 0 • Fitness: percentage of correct output bits, taking the max: • across 100-bit sliding windows • across CLBs

  25. FPGA with Fault Injected

  26. GA Parameters • Generational GA • Popsize: 40 • Crossover: 80% • Mutation: up to 0.2% per bit • Elitism: 2 individuals • Gen 0 Seeding: 20 individuals seeding with hand-designed Quad Decoder

  27. Temperature map of FPGAlogic cells during evolution HW: Xilinx Virtex XCV1000 FPGA Ckt: Quadrature Decoder – Exp 3

  28. Evolving a Complete Repair elitist average Fitness generation

  29. Results • Genetic algorithm is able to consistently find quad decoders operating at 100% accuracy with a single injected stuck-at fault • Out of sample test yields 97% accuracy (expected to rise as fitness test case length increases) • The stuck-at fault is used in the solutions found (GA is exploiting the fault) • Most runs converge after 1500-2000 circuit evaluations • Average population fitness increases until convergence (useful search)

  30. Recent Publications Evolvable Hardware – Technical Papers: • J D. Lohn, G. Larchev, and R. F. DeMara, “Evolutionary Fault Recovery in a Virtex FPGA Using a Representation That Incorporates Routing,” In Proceedings of the 10th Reconfigurable Architectures Workshop (RAW 2003), Nice, France, April 22, 2003. • J. D. Lohn, G. Larchev, and R. F. DeMara, “A Genetic Representation for Evolutionary Fault Recovery in Virtex FPGAs,” In Proceedings of the 5th International Conference on Evolvable Systems (ICES), Trondheim, Norway, March 17 - 20, 2003. • J. D. Lohn and R. F. DeMara, “A Co-evolutionary Genetic Algorithm for Autonomous Fault-Handling in FPGAs,” accepted to International Conference on Military and Aerospace Programmable Logic Devices, Laurel, MD, September 10 - 12, 2002. Machine Learning (EHW subcomponent)– Curriculum and Educational: •   M. Georgiopoulos, J. Castro, A. Wu, R. DeMara, E. Gelenbe, A. Gonzalez, M. Kysilka, M. Mollaghasemi, “CRCD in Machine Learning at the University of Central Florida Preliminary Experiences,” In Proceedings of 8th Annual Conference on Innovation and Technology in Computer Science Education, University of Macedonia, Thessaloniki, Greece, June 30 - July 2, 2003. •   M. Georgiopoulos, I. Russell, J. Castro, A. Wu, M. Kysilka, R. DeMara, A.Gonzalez, E. Gelenbe, M. Mollaghasemi, “A CRCD Experience: Integrating Machine Learning Concepts into Introductory Engineering and Science Programming Courses,” In Proceedings of 2003 American Society for Engineering Education (ASEE) Annual Conference and Exposition, Nashville, Tennessee, June 22 - 25, 2003.

  31. GA Advantages • Widely applicable • Low development costs (“engineering ready”) • Creativity - surprising solutions • Can be run interactively, accommodate user proposed solutions • Provide many alternative solutions … design time fault tolerance • Abundant intrinsic parallelism • Scales with Moore’s Law :-)“10x in 5”

  32. Conclusion • One of the first studies to look at evolving interconnect for fault-recovery in FPGAs • Output results encouraging • Current work: • Reducing execution time for autonomous recovery • Scaling to complex problems • Robustness of evolved solutions • On-line experiments that can safeguard the FPGA • Integrating Machine Learning & EHW into UCF curriculum • EHW in EEL4851, EEL4972, EEL6763 • Subpart of multi-year NSF CRCD Award (Georgiopoulos, DeMara, Gelenbe, Gonzalez, Kysilka, Wu)

More Related