1 / 25

50 th Annual Allerton Conference, 2012 On the Capacity of Bufferless Networks-on-Chip

50 th Annual Allerton Conference, 2012 On the Capacity of Bufferless Networks-on-Chip. Alex Shpiner , Erez Kantor, Pu Li, Israel Cidon and Isaac Keslassy. Faculty of Electrical Engineering , Technion , Haifa, Israel. Network-on-Chip ( NoC ). Packet-based network infrastructure.

shanon
Download Presentation

50 th Annual Allerton Conference, 2012 On the Capacity of Bufferless Networks-on-Chip

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 50th Annual Allerton Conference, 2012On the Capacity of Bufferless Networks-on-Chip Alex Shpiner, Erez Kantor, Pu Li, Israel Cidon and Isaac Keslassy Faculty of Electrical Engineering, Technion, Haifa, Israel

  2. Network-on-Chip (NoC) Packet-based network infrastructure Buses and dedicated wires

  3. Network-on-Chip (NoC)

  4. Collision

  5. Buffering • Drawbacks: • Dynamic and static energy. • Chip area. • Complexity of the design.

  6. Deflecting • Drawbacks: • No latency guarantee. • No bandwidth guarantee. • Not the shortest path.

  7. Scheduling

  8. The Objective Scheduling algorithm for bufferlessnetwork that maximizes throughput and guarantees QoS.

  9. Complete-ExchangePeriodic Traffic In a period: Every node sends one unicast data packet to every othernode.

  10. Complete-ExchangePeriodic Traffic Core 0 Core 1 Core 2 Core 3 time computation computation communication • Computation step: autonomous processing. • Communication step: every core sends unicast data packet to every other core. Applications: • Bulk Synchronous Parallel (BSP) programing. • Numerical parallel computing (FFT, matrix transpose, …). • End-to-end congestion control.

  11. Contributions Optimal scheduling algorithm for line and ring. Optimal scheduling algorithm for torus. Constant approximation and bounds for mesh.

  12. Related Work • BufferlessNoCs designs • Deflecting [Moscibroda et al. ‘09] • Dropping [Gomez et al. ‘08] • TDM-based NoCs • Aethereal [Goosens et al. ‘05] – provides architecture, not scheduling. • Nostrum [Millberg et al. ‘04] – uses buffers. • Direct Routing • NP-hard for general traffic [Busch et al. ‘06]

  13. Problem Definition Line, ring, torus or mesh network topology. Complete-exchange periodic traffic pattern. No buffering, deflecting or dropping packets. Equal propagation times and capacity on links. Equal packet sizes. Shortest routing.

  14. Problem Definition • Find a schedulethat maximizes throughput • Minimizes the period time.

  15. Degree-Two NoC Scheduling (DTNS) Algorithm 1→2 2→3 3→4 1→3 2→4 1→4 Each node i, at each time slot t, for each direction: • If at t-1 received a packet for retransmission, then retransmit it at t. • Else, inject packet to the farthestdestination among all packets waiting to be sent from the node.

  16. DTNS Period Length , if is even , if is odd time slots time slots n-Line: • time slots. • Almost achieves capacity limit. • Impossible to spread traffic uniformly: central link is a bottleneck. n-Ring: • Achieves capacity limit for odd n. • For even n achieves capacity with overlapping.

  17. Torus NoC Scheduling (TNS) Algorithm Inject simultaneously in four directions. Long-then-short routing. Dist(x1, x2)=min{|x1-x2|, N-|x1-x2|}

  18. Torus NoC Scheduling (TNS) Algorithm • Period consists of phases. • Phaseconsists of epochs. • For packet from (a,b) to (c,d): • Phase • i = max{Dist(a,c),Dist(b,d)} • Epoch • for clockwise • j = min{Dist(a,c),Dist(b,d)} • for counter-clockwise • j-i= min{Dist(a,c),Dist(b,d)}

  19. TNS Period Length time slots , if is odd , if is even time slots -Torus: • Achieves capacity limit for odd N. • For even n achieves capacity limit with overlapping.

  20. Mesh time slots , if is even , if is odd time slots Lower bound for period length:

  21. TNS Algorithm in Mesh 2N N N 2N Upper bound for period length: =

  22. Bounds for Mesh Scheduling Period Length • -constant approximation.

  23. Evaluation Throughput = num. of packets / period length

  24. Summary • Use bufferlessNoCs to reduce chip power and area consumption. • Rely on knowledge of periodic traffic for scheduling to increase capacity. • Complete-exchange traffic. • Line, Ring – DTNS optimal scheduling. • Torus – TNS optimal scheduling. • Mesh – bounds for TNS application.

  25. Thank you.

More Related