1 / 20

Handling Global Traffic in Future CMP NoCs

Handling Global Traffic in Future CMP NoCs. Ran Manevich, Israel Cidon, and Avinoam Kolodny. . Electrical Engineering Department Technion – Israel Institute of Technology Haifa, Israel. QNoC. Research. Group. SLIP 2012. Bandwidth Version of Rent’s Rule. B – Cluster external bandwidth.

carlow
Download Presentation

Handling Global Traffic in Future CMP NoCs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Electrical Engineering Department Technion – Israel Institute of Technology Haifa, Israel QNoC Research Group SLIP 2012

  2. Bandwidth Version of Rent’s Rule B – Cluster external bandwidth. k – Average bandwidth per module. G – Number of modules in a cluster. R – Rent’s exponent, 0<R<1. B = kGR G = 16 B = ∑ Greenfield et al., “Implications of Rent’s Rule for NoC Design and Its Fault-Tolerance”, NOCS 2007

  3. Rent’s Exponent Reflects Traffic Locality

  4. CMP NoC Traffic Follows Rent’s Rule 2D Mesh NoC ~Average of CMP parallel programs* *Heirman et al., “Rent’s Rule and Parallel Programs: Characterizing Network Traffic Behaviour”, SLIP 2008

  5. 2D Mesh – Packets Classification by Distance • For illustration purposes, packets are classified according to distances between sources and destinations. • Nearest Neighbor (NN) – • Dist = 1 • Local – 1<Dist<2+K/8 K=16 K=8 • Global – Dist ≥ 2+K/8

  6. Fraction of global packets decreases in large systems Rent’s exponent (R) = 0.7 (Nearest Neighbor)

  7. Dominance of Global Packets in BW/Router and Light Load Latency Nearest Neighbor traffic is dominant in small systems. * • In large systems: • Global packets are minority. • Global packets dominate BW/router and average latency. *Zarkesh-Ha et al., “Hybrid Network on Chip (HNoC): local buses with globalmesh architecture”, SLIP 2010

  8. Problem!!! • In large systems, global packets (minority): • Consume most of the network’s BW. • Significantlyincrease averagelight load latency.

  9. Solution - PyraMesh • Hierarchical 2D mesh. • Global packets are routed through higher hierarchy levels. • Overall hops-count is reduced. Dest. • Average latency is reduced. Source 4 5 6 1 7 2 8 3 hops • Average BW per router is reduced. instead of 14!

  10. PyraMesh - Architecture K – The size of the base mesh. NL – Number of levels. NP – Number of pyramids on top of the base mesh. αi – Ratio between the sizes of levels iand i+1. Ci – Number of routers in level i that are connected to a router in level i+1 along a single dimension. K = 8, NL = 2, NP = 4 αi= 4, Ci= 1 K = 8, NL = 3, NP = 1 αi= 2, Ci= 1 K = 8, NL = 2, NP = 1 αi= 4, Ci= 2

  11. PyraMesh – Addressing and Routing • Addressing – On each level i, node (X,Y)Base Mesh is represented by the nearest router in the North-East quarter: • Routing – XY:

  12. PyraMesh – Packets Classification • Packets are distributed among levels iaccording to their travel distance (D) in the base mesh. • DThi – Distance threshold of level i. • If D > DThi , the packet is directed to level i+1. • Example: DThi= 6, 12, 20

  13. PyraMesh – Optimization Area overhead, Wiring overhead, Maximum bandwidth per router*, Average light-load latency*=F(K,NL,NP,αi,Ci,Dthi*,R*) OPTIMIZATION OBJECTIVES CONSTRAINTS

  14. Optimization Results Example of 16x16 System, R = 0.7 • Light load latency optimized PyraMesh: Packets distance thresholds D>8 5<D≤8 D≤5 • Throughput optimized PyraMesh: D>18 6<D≤18 D≤6

  15. Light Load Latency Performance BMesh – The baseline mesh HNoC – Scaled Mesh (SMesh) – Links wider than in BMesh by PyraMesh area overhead factor.

  16. Throughput Results, R = 0.7

  17. Our Contributions • Characterization of Rentian traffic in large NoCs. • The observation that global packets limit scalability of large systems. • PyraMesh – A novel framework for hierarchical NoCs design.

  18. Conclusions • Global packets limit performance in large (future) CMP systems. • PyraMesh – A novel class of hierarchical 2D mesh topologies. • PyraMesh handles global traffic in future CMP NoCs.

  19. Thank You!

  20. Related Work CMesh J. D. Balfour and W. J. Dally. “Design tradeoffs for tiled CMP on-chip networks”. International Conference on Supercomputing, 2006. GigaNoC • C. Puttmann, J.-C. Niemann, M. Porrmann, and U. Rückert. “GigaNoC – A hierarchical network-on-chip for scalable chip-multiprocessors.” Euromicro DSD 2007. Hierarchical 2-Levels 2D Mesh • Markus Winter and Steffen Prusseit and Gerhard P. Fettweis. Hierarchical routing architectures in clustered 2D-mesh networks-on-chip. ISOCC 2010. Hierarchical Rings on a Mesh • S. Bourduas and Z. Zilic. “Latency reduction of global traffic in wormhole-routed meshes using hierarchical rings for global routing”. ASAP 2007. Long Range Links • U. Y. Ogras and R. Marculescu. “ ‘It’s a small world after all’: NoC performance optimization via long-range link insertion”. IEEE Trans. on Very Large Scale Integr. (VLSI) Syst. 2006.

More Related