1 / 24

Redundancy-Aware, Fault-Tolerant Clustering

Redundancy-Aware, Fault-Tolerant Clustering. Jason Cong and Brian Tagiku VLSI CAD Lab Computer Science Department University of California, Los Angeles {cong,btagiku}@cs.ucla.edu http://cadlab.cs.ucla.edu/. Overview of IC-DFN Efforts at UCLA. Synthesis for higher level of abstraction

Download Presentation

Redundancy-Aware, Fault-Tolerant Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Redundancy-Aware, Fault-Tolerant Clustering Jason Cong and Brian Tagiku VLSI CAD Lab Computer Science Department University of California, Los Angeles {cong,btagiku}@cs.ucla.edu http://cadlab.cs.ucla.edu/

  2. Overview of IC-DFN Efforts at UCLA • Synthesis for higher level of abstraction • Architecture and synthesis for nanoFPGAs (jointly with Prof. Tim Cheng, Evelyn Hu, and Kang Wang) • Synthesis for error-resilient designs UCLA VLSICAD LAB

  3. xPilot: Platform-Based Synthesis System SystemC/C/MMM Platform Description & Constraints • Uniqueness of xPilot • Platform-based synthesis and optimization • Communication-centric synthesis • Recent Progress on xPilot • Refined MMM-to-SSDM translation • Efficient & versatile scheduling engine based system of difference constraints (DAC’06) • Communication-centric binding based distributed register file μ- arch (ICCAD’06) • Behavior-and-communication co-optimization for interface synthesis (DAC’06) • Design drivers • Motion-JPEG • MPEG4 simple profile video decoder Hybrid approach on Xilinx XUP board • Microblaze (or PowerPC) + HW synthesized blocks xPilot xPilot Front End Profiling SSDM(System-Level Synthesis Data Model) Analysis Mapping Processor & Architecture Synthesis Interface Synthesis Behavioral Synthesis Custom Logic Drivers + Glue Logic Processor Cores+ Executables FPSoC UCLA VLSICAD LAB

  4. MPEG-4 Simple Profile Decoder: Synthesis Results • Complexity of synthesized RTLs UCLA VLSICAD LAB

  5. Updated Results on Motion-JPEG Example Preprocess DCT Quant Huffman Model #1 : 5 Microblazes FSL-based communication Table Modification OR HW-DCT Preprocess Quant Huffman Encoded JPEG Images Model #2 : 4 Microblazes + DCT on FPGA fabrics Table Modification RAW Images UCLA VLSICAD LAB FSL-based communication is a major performance overhead Xilinx XUP Board

  6. Overview of IC-DFN Efforts at UCLA • Synthesis for higher level of abstraction • Architecture and synthesis for nanoFPGAs (jointly with Prof. Tim Cheng, Evelyn Hu, and Kang Wang) • Synthesis for error-resilient designs • Redundancy-aware, fault-tolerant clustering UCLA VLSICAD LAB

  7. Hierarchical FPGAs • 2 level, hierarchical circuit logic • Level 1 – LUTs • Level 2 – Clusters of LUTs • Higher levels (clusters of clusters) also possible • Uses locality of interconnections to improve circuit performance UCLA VLSICAD LAB

  8. Redundancy in FPGAs • LUTs can fail with some probability • Allocate extra components (e.g. LUTs) into the system • Re-route inputs and outputs to a spare LUT • Ideally, want the spare LUT to be close to the failure so that delay does not increase UCLA VLSICAD LAB

  9. Redundancy in FPGAs • LUTs can fail with some probability • Allocate extra components (e.g. LUTs) into the system • Re-route inputs and outputs to a spare LUT • Ideally, want the spare LUT to be close to the failure so that delay does not increase UCLA VLSICAD LAB

  10. Redundancy in FPGAs • LUTs can fail with some probability • Allocate extra components (e.g. LUTs) into the system • Re-route inputs and outputs to a spare LUT • Ideally, want the spare LUT to be close to the failure so that delay does not increase UCLA VLSICAD LAB

  11. Redundancy in FPGAs • LUTs can fail with some probability • Allocate extra components (e.g. LUTs) into the system • Re-route inputs and outputs to a spare LUT • Ideally, want the spare LUT to be close to the failure so that delay does not increase UCLA VLSICAD LAB

  12. Redundancy in FPGAs • LUTs can fail with some probability • Allocate extra components (e.g. LUTs) into the system • Re-route inputs and outputs to a spare LUT • Ideally, want the spare LUT to be close to the failure so that delay does not increase UCLA VLSICAD LAB

  13. A C B D Motivational Example • 4 LUTs (each of delay 1) • 2 Clusters of 3 LUTs • Inter-cluster edges have delay 3 • Target delay 6 • LUTs fail with probability 0.1 A C B D UCLA VLSICAD LAB

  14. Motivational Example UCLA VLSICAD LAB

  15. The Problem • Inputs • A network G of n LUTs (acyclic) • An FPGA with C clusters, each with M LUTs • Inter-cluster interconnect delay d • Target circuit delay D • Probability p of LUT failure • Objective • Cluster G using no more than C clusters such that probability of circuit achieving delay D or faster is maximized. • LUT duplication allowed, but at the cost of a spare LUT. UCLA VLSICAD LAB

  16. Dynamic Programming Heuristic • Use a dynamic programming matrix A • A is an n £ n £ D matrix • Each entry A[i,j,k] stores a clustering solution of LUT i and its predecessors such that • Exactly j clusters are used • The minimum arrival time at the output of i is k • The probability of the circuit achieving delay k is maximized UCLA VLSICAD LAB

  17. Dynamic Programming Heuristic • Filling out the matrix • Traverse graph in topological order • For PI, form its own cluster • For all others • Select subset of parents • Select clusters of parents and merge • Place resulting clustering in A if probability of achieving k is largest so far • Repeat for all possible subsets of parents and clusterings UCLA VLSICAD LAB

  18. Dynamic Programming Heuristic • Filling out the matrix • Traverse graph in topological order • For PI, form its own cluster • For all others • Select subset of parents • Select clusters of parents and merge • Place resulting clustering in A if probability of achieving k is largest so far • Repeat for all possible subsets of parents and clusterings PI PI UCLA VLSICAD LAB

  19. Dynamic Programming Heuristic • Filling out the matrix • Traverse graph in topological order • For PI, form its own cluster • For all others • Select subset of parents • Select clusters of parents and merge • Place resulting clustering in A if probability of achieving k is largest so far • Repeat for all possible subsets of parents and clusterings UCLA VLSICAD LAB

  20. Dynamic Programming Heuristic • Filling out the matrix • Traverse graph in topological order • For PI, form its own cluster • For all others • Select subset of parents • Select clusters of parents and merge • Place resulting clustering in A if probability of achieving k is largest so far • Repeat for all possible subsets of parents and clusterings UCLA VLSICAD LAB

  21. DP Heuristic Performance • All LUTs weight 1 • 10% failure rate • Intracluster edge delay 0 • Intercluster edge delay 3 • 8 clusters each of 3 LUTs • Target delay of 7 UCLA VLSICAD LAB

  22. DP Heuristic Performance DP clustering Achieves delay 7 with probability ≈ 0.39 Min-delay clustering Achieves delay 7 with probability ≈ 0.28 UCLA VLSICAD LAB

  23. Difficulties • Best known algorithm for calculating probability distribution of delays is exponential • Reconvergent fan-out introduces dependencies in probabilities • Can’t use exact probabilities to guide algorithms/heuristics • Hard to evaluate the performance of algorithms/heuristics • Difficult to assess quality of a sub-clustering of a node and its fan-in cone • Global knowledge (e.g. placement of spares) of the clustering is needed • Makes dynamic programming a harder approach UCLA VLSICAD LAB

  24. Future Work • Study the tractability of the problem • Propose exact or approximation algorithms or better heuristics • Generalize the interconnect delays so the problem addresses LUT placement • Study the problem of assigning failures to spares so as to minimize delay UCLA VLSICAD LAB

More Related