1 / 34

The Era of Many-Module SoC : Revisiting the NoC Mapping Problem

The Era of Many-Module SoC : Revisiting the NoC Mapping Problem. Technion – Israel Institute of Technology. Isask’har ( Zigi ) Walter, Israel Cidon , Avinoam Kolodny , Daniel Sigalov. December, 2009. SoC Revolution. PE1. PE2. PE3. R. R. R. PE1. PE2. PE3. R. R. R. PE4.

rufus
Download Presentation

The Era of Many-Module SoC : Revisiting the NoC Mapping Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Era of Many-Module SoC:Revisiting the NoC Mapping Problem Technion – Israel Institute of Technology Isask’har (Zigi) Walter, Israel Cidon, AvinoamKolodny, Daniel Sigalov December, 2009

  2. SoC Revolution PE1 PE2 PE3 R R R PE1 PE2 PE3 R R R PE4 PE5 PE6 PE4 PE5 PE6 Bus-based system NoC-based system

  3. SoC Evolution R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R

  4. Processor Evolution Single Core CPU Dual Core CPU1 CPU2 Cache Quad Core CPU1 CPU2 Cache Cache Cache Cache CPU3 CPU4 Cache Cache

  5. The Era of Many-Module SoC • How would such chips be like? • Most likely • Power still important • Highly parallel • IP reuse • Ease of design and verification • High Certainty • Large number of modules • NoC Interconnect • Applications • Totally unknown R

  6. Future SoCs - Observation#1 • Special purpose cores replace general purpose processors • Power considerations Task 2 Task 3 Task 2 Task 3 Task 1 Task 4 Task 5 Task 1 Task 4 Task 5 MEM MEM MEM MEM DSP GeneralPurposeCPU Pre. Proc. CPU GPU Memory Memory • Processing pipes are getting longer

  7. Future SoCs - Observation#2 ? R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R • Large diversity • All modules are unique • Highly regular • Classes of Replicated cores • standard modules (DSP, HW accelerators, Cache banks, etc.)

  8. The Era of Many-Module SoC • Increased use of specialized cores • Pipes are getting longer • Replication of processing elements • How is the design flow affected? • This work – mapping of the NoC Observation#1 Observation#2

  9. Outline • The Era of Many Module SoC • Revisiting the Mapping Problem • Cross-Entropy Optimization • Evaluation

  10. NoC Mapping • Given • Traffic pattern(s) • a set (or sets) of pair-wise bandwidth requirements and timing constraints • Routing • Topology • Goal • Find efficient mapping of cores to tiles PE1 PE2 PE3 PE5 PE2 PE8 PE8 PE5 PE7 PE9 PE2 PE6 PE3 PE2 PE1 PE4 PE5 PE6 PE7 PE1 PE6 PE4 PE9 PE1 PE3 PE1 PE8 PE8 PE9 PE6 PE7 PE8 PE9 PE3 PE4 PE9 PE2 PE3 PE6 PE4 PE7 PE5 PE5 PE4 PE7

  11. Mapping Optimization • An important design step • Mapping affects power and performance! • A difficult problem! • Often heuristic algorithms are used • Common optimization goals • Minimize (dynamic) power • Minimize power + maximize performance • Minimize power subject to performance constraints

  12. Modeling • Typical modeling • Power and latency proportional to distance • Cost function:

  13. Calculating Mapping Cost PE1 PE2 PE3 30 100 PE4 PE5 PE6 Mapping π1 Mapping π2

  14. Motivation - Example #1 PE1 PE2 PE3 PE4 1 1 1 1 1 1 1 MEM1 MEM2 PE1 MEM2 PE4 PE2 PE3 MEM1 Optimal mapping (π1):

  15. Motivation - Example #1 (cont.) • Let the mapping algorithm assign the flows! PE1 PE2 PE3 PE4 2*MEM • Optimal mapping (π2): PE1 PE2 MEM2 PE1 MEM1 PE4 MEM1 PE3 PE4 PE2 PE3 MEM2 Cost(π2)=7 Cost(π1)=9

  16. Motivation - Example #1 (cont.) PE1 PE2 PE3 PE4 1 1 1 1 1 1 1 MEM1 MEM2 PE1 PE2 MEM2 MEM1 PE3 PE4 Cost(π2)=7 • The mapping algorithm should be aware of replicated modules!

  17. Classic Performance Constraints • Pair-wise point-to-point requirements • For example, in a 4-module system:

  18. Motivation - Example #2 Stream 1 PE1 PE2 PE3 Stream 2 PE4

  19. Example #2 – Pair-wise req. PE1 PE2 PE3 Req=2 Req=1 Req=1 Req=1 PE4 PE2. PE1 PE3 PE1 PE3 PE1 PE3 PE4. PE2. PE4. PE4 PE2. No feasible mapping!

  20. Application-Level Requirements Stream 1 PE1 PE2 PE3 Req=2 Req=1 Req=1 Req=1 Stream 2 PE4 PE1 PE2 • A feasible mapping does exist! PE3 PE4 • It’s better to work with the application level requirements

  21. This Work • Find efficient mappings by extending the formulation of the mapping problem • Adding degrees of freedom • Degree of freedom #1 • Leverage existence of replicated modules • Degree of freedom #2 • Replace p2p constraints with end-to-end, application-level requirements

  22. Modifying the Formulation (1) • Leverage existence of replicated modules • Allow the mapping algorithm to allocate flows to the best replicated module

  23. Modifying the Formulation (2) • Replace p2p constraints with end-to-end, application-level requirements P2P timing req. E2E timing req. • In this paper, for synthetic task graphs • Did so for a real application too

  24. Outline • The Era of Many Module SoC • Revisiting the Mapping Problem • Cross-Entropy Optimization • Evaluation

  25. Cross Entropy Optimization • Modern optimization heuristic • Good at combinatorial optimization problems • Akin to evolutionary algorithms • Generation of new solutions is based on sampling and estimation • Inherently a global search method • Reduced risk of getting trapped in a local minimum

  26. Cross Entropy Optimization • Given an initial parameter vector v=v0, sample a random population of K solutions x1,x2,…,xk from the distribution given by f(x;v). • Evaluate the costs S(xi),i=1,…,K. • Using the ρK (0<ρ<1) elite (lowest cost) samples, obtain a new density function f(x;v) by calculating a new vector v via Maximum Likelihood (ML) estimation. • Repeat steps 1-3 with the new vector v unless maximum number of iterations is reached or no improvement is obtained for a predefined number of iterations. • For example: • Generate 10 random mappings: π1, π2, …, π10 • Find 3 lowest cost mappings: π2, π5, π7 • Examine those 3 best mappings: • For each tile, calculate the probability core PEi is mapped to that tile • Update probabilities accordingly

  27. CE Example Prob(TileAPE1)=Prob(TileAPE2)= Prob(TileAPE3)=Prob(TileAPE4)=0.25 Prob(TileBPE1)=Prob(TileBPE2)= Prob(TileBPE3)=Prob(TileBPE4)=0.25 Prob(TileCPE1)=Prob(TileCPE2)= Prob(TileCPE3)=Prob(TileCPE4)=0.25 Prob(TileDPE1)=Prob(TileDPE2)= Prob(TileDPE3)=Prob(TileDPE4)=0.25 TileA TileB PE1 PE1 PE1 PE2 PE1 PE4 PE3 PE2 PE4 PE4 PE4 PE2 PE1 PE1 PE2 PE3 PE2 PE1 PE1 PE3 π2 π1 π3 π4 π5 PE4 PE3 PE2 PE1 PE4 PE3 PE1 PE3 PE3 PE3 PE4 PE4 PE3 PE2 PE2 PE2 PE2 PE4 PE3 PE4 TileC TileD π6 π7 π8 π9 π10

  28. Updating Probabilities π2 π5 π3 • Prob(TileAPE1)=1 TileA TileB PE1 PE2 PE1 PE2 TileC TileD PE3 PE4 PE4 PE3 PE1 PE4 • Prob(TileBPE2)=2/3 • Prob(TileBPE4)=1/3 PE3 PE2 • Prob(TileDPE2)=1/3 • Prob(TileDPE3)=1/3 • Prob(TileDPE4)=1/3 • Prob(TileCPE3)=2/3 • Prob(TileCPE4)=1/3 • Following iteration uses these updates probabilities • Gradually, probabilities converge to 0/1

  29. Outline • The Era of Many Module SoC • Revisiting the Mapping Problem • Cross-Entropy Optimization • Evaluation

  30. Evaluation • Scenario • 6x6 mesh NoC • Synthetic, randomized SoC • Task graphs (and task-to-core mapping) • Varying number of replicated modules • Varying timing constraints • (Real application in DATE10 paper) • Compare with best cost of classic mapping • Averaging multiple runs

  31. Accounting for Replication • “Class”: a group of identical PEs • Total number of replicated cores= {Number of classes}*{class size} Cost Reduction [%] 10 20 30 40 50 Total number of Replicated Modules [%]

  32. Application-Level Requirements • SoCs with a pipeline data path and background P2P traffic • Varying pipeline slack • Different amounts of background constraints Cost Reduction [%] Pipeline Slack

  33. Conclusions and Future Work • We are going into the era of “Many module SoC” • Extend the mapping to account for • Classes of replicated modules • Application-level requirements • Meaningful power savings • But mapping is an example • Routing? Task assignment? Link design? Topology selection?

  34. The Era of Many-Module SoC QNoC Research Group Thank you! Questions? zigi@tx.technion.ac.il QNoC Research Group

More Related