1 / 19

Outline

I2CRF: Incremental Interconnect Customization for Embedded Reconfigurable Fabrics. Jonghee W. Yoon, Jongeun Lee*, Jaewan Jung, Sanghyun Park, Yongjoo Kim , Yunheung Paek and Doosan Cho** Seoul National University, Korea *UNIST, Korea

wes
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I2CRF: Incremental Interconnect Customization for Embedded Reconfigurable Fabrics Jonghee W. Yoon, Jongeun Lee*, Jaewan Jung, Sanghyun Park, Yongjoo Kim, Yunheung Paek and Doosan Cho** Seoul National University, Korea *UNIST, Korea **Sunchon National University, Korea

  2. Outline • CGRA & Augmentation • Overall Design Flow • Our Approach (I2CRF) • Problem definition(Inexact graph matching) • Mapping with A* search • Experiment • Conclusion

  3. Reconfigurable Architecture • Reconfiguration is emerging • increasing needs for flexible and high speed computing fabrics • CGRAs (Coarse-Grained Reconfigurable Architectures) • operation level granularity • high performance • S/W development is easy MorphoSys ADRES

  4. Augmentation • General CGRA - Mapping • CGRA Arch. + Applications  Configurations • Application specific CGRAs - Synthesis • Applications  New Arch. + Configurations • Augmentation • Base CGRA + Applications New Arch.+Configurations • Customizable Features • The number of PEs • The set of PE operation • Heterogeneity or Homogeneity • Memory subsystem architectures • Interconnection network 14% (130nm)  30%(45nm) Interconnect Exploration for Energy Versus Performance Tradeoffs for Coarse Grained Reconfigurable Architectures, TVLSI 2009 Energy consumption

  5. Overall design flow - I2CRF Kernel I2CRF (Incremental Interconnect Customization for Reconfigurable Fabrics ) Base CGRA Vertex Clustering Mapping (A* Search for Minimum-Cost Edit Path) Arch Extension + (Accum.) Interconnections Application-Specific Reconfigurable Architecture Not Satisfied Evaluation

  6. I2CRF • Incremental architecture change by adding interconnections to the base architecture • Strengths • Regularity is maintained through the base architecture • But provides specialization for the target applications • Fast specialization and no limitation for design space • The architecture change occurs while kernel is mapped.

  7. The difference Compared with general mapping • Existing application mapping for CGRA • Find a graph XC that is isomorphic to K • Augmentation and Mapping • Find the a graph Y that is isomorphic to K and a subset of C` which is most similar to C 2 1 3 1 4 × 2 3 5 6 General Mapping 4 5 PE 1 PE 2 2 1 6 PE 3 PE 4 3 4 Kernel graph, K Base CGRA graph, C PE 5 PE 6 5 6 Augmentation and Mapping

  8. Problem Definition - Inexact Graph Matching Problem • How to find C which is most similar to C0 : Inexact graph matching • Similarity between two graph can be measured by calculating the cost of graph edit path • Edit path is the set of edit operations that transform G1 into another G2 • Edit operations • Node(or edge) substitution : NS, ES ( identical or non-identical ) • Node(or edge) insertion : NI, EI • Node(or edge) deletion : ND, ED • All the other edit operations are induced by Node substitution. Identical ES NS 1  e 2  a 3  h 4  d 5  b 6  g 7  f Non-identical ES & NI 1 2 a b c a2 b5 d e f f7 e1 d4 4 3 5 g h i g6 h3 6 7 ED EI <G2> <G1>

  9. Graph Edit Cost Model • Ce - The cost of Edge deletion • Interconnection insertion cost • Cv - The cost of Node insertion • Routing PE insertion cost • Routing PE can replace interconnection insertion in case there are extra PEs • Do not need augmentation • can reduce the amount of architecture extension • Cv is much cheaper than Ce

  10. A* Search for Min Cost Edit Path • Inexact graph matching problem is NP-complete  How to search the mapping space for the min cost path : A* Search algorithm • Root : Kernel graph • Leaf : Sub-CGRA graph • s : current mapping state • g(s) : The sum of the costs(Ce, Cv) of the graph edit operations from root to current state s • h(s) : The estimated cost from current state s to a leaf state • Assessment of the partial mapping s • g(s) + h(s)

  11. Vertex Scattering • Make clusters of vertex and assign each cluster to row • Strengths of Vertex scattering • Search space reduction • Considering shared resource constraints 1 1 1 2 2 Row 1 2 3 3 4 Row 2 5 4 3 PE 1 PE 2 PE 3 4 5 5 Clustering & Row assignment Kernel Final mapping PE 4 PE 5 PE 6

  12. h(s) & Vertex Scattering • Heuristic function, h(s) … • guides the fast search of mapping space • needs cost estimation methods • Detecting difficult-to-map edges • After vertex scattering • Forks, Over-length edges cannot be mapped to a mesh without routing PE or a custom interconnection links • H(s) # of forks & over-length edges (=Nr ) • Unroutable difficult-to-map edge (c1) has more cost than routable (c2) 2 5 7 1 4 6 3

  13. Example c1 = cv = 1 c2 = ce = 3 4 4 1 1 2 2 3 3 s=0 { } g( s ) + h( s ) = 0 + 1 4 1 s=1 {(11)} • s=3 • {(13)} s=2 {(12)} 0+1 0+1 0+1 3 3 2 s=4 {(42)} s=5 {(43), ($2)} 0+1 1+1 PE 1 PE 2 PE 3 s=7 {(25)} s=8 {(24)} s=6 {(26)} 0+1 0+1 0+1 PE 4 PE 5 PE 6 s=9 {(33), ($5} s=10 {(35), ($4)} 4+0 1+0

  14. Experimental Setup • We test I2CRF on a CGRA called RSPA • mesh base interconnection • Each row has 2 shared multipliersEach row can perform 2 loads and 1 store • PE can be used for routing • Benchmarks from • Livermore loops, MultiMedia and DSPStone • Comparison to Mesh, 1-hop, Diagonal, and Mixed

  15. Performance Improvement • IPC of 16 is equivalent to 100% utilization • PE utilization and the IPC are increased by more than 70% on average compared to Mesh or by 41% on average compared to Mixed

  16. Customization Overhead • Through our interconnection increment, … • # of new interconnection links is very small • Very marginal increase in the overall Mux complexity

  17. Optimization Time • Find competitive custom interconnection architecture with configuration in reasonable time.

  18. Conclusion • We presented an interconnection customization method for CGRAs • Our method exploits the similarity between the interconnection customization problem and inexact graph • Non-homogeneous extensions to a base interconnection architecture may present some challenges and possibly penalty in back-end VLSI design matching • We plan to find out the extent of the difficulty due to the non-homogeneity as well as find novel ways to mitigate any impact if necessary

  19. Thank you for your attention!

More Related