1 / 31

Approaching Ideal NoC Latency with Pre-Con fi gured Routes

Approaching Ideal NoC Latency with Pre-Con fi gured Routes. George Michelogiannaki s Master’s Thesis Thesis Advisor: Prof. Manolis Katevenis Computer Science Department University of Crete. The Future is CMPs –> OCINs Critical. 2006. 2007.5. 2009. 2010.5. 2012. 2013.5? 2015?.

everly
Download Presentation

Approaching Ideal NoC Latency with Pre-Con fi gured Routes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approaching Ideal NoC LatencywithPre-Configured Routes George Michelogiannakis Master’s Thesis Thesis Advisor: Prof. Manolis Katevenis Computer Science Department University of Crete

  2. The Future is CMPs –> OCINs Critical 2006 2007.5 2009 2010.5 2012 2013.5? 2015?

  3. SoCs Grow in Complexity CSD - UOC, Heraklion, Greece

  4. What are NoCs? • On-chip structured communication infrastructure. • i.e. the solution! • Composed of routers (switches), channels (wires), network interface logic. • NoC approach inspired by macro networks success. • Some concepts shared. CSD - UOC, Heraklion, Greece

  5. NoC Cost, Channels & Workload • Resource limitation. • Cost is Si area and power. Wires plentiful. • Many long, wide channels. Buffers limited. • Worrying area & power overhead numbers! • Different constraints motivate some surprising differences in design. • NoCs usually developed for a specific application set. CSD - UOC, Heraklion, Greece

  6. Packet Format • Packet divided into flits (wormhole routing). • First flit address or request. • Data flits may follow. • They contain no address information. CSD - UOC, Heraklion, Greece

  7. Reference NoC Topology • 2D mesh. • Router 5x5. • Relevant issues: • Routing. • Floorplan. • Topology. • Fault-tolerance. • Virtual Channels. CSD - UOC, Heraklion, Greece

  8. Virtual Channel Routers CSD - UOC, Heraklion, Greece

  9. Our Work - Introduction • Problem: Latency NoCs impose. • Motivation: Latency introduced to every communication pair. • Past work: Achieves 1 cycle/hop at 500 MHz. • We extend speculation to routing decisions. • Goal: Approach buffered wire latency. • Fraction of cycle/hop. CSD - UOC, Heraklion, Greece

  10. Our Approach • 400 ps good scenario; 1 cycle otherwise. 130 nm library CSD - UOC, Heraklion, Greece

  11. Preliminary Simulation Results Dynamic multiplexer Pre-configured multiplexer Latency Latency: 2 – 2.5 times lower CSD - UOC, Heraklion, Greece

  12. Preferred Paths • Each output has one preferred input. • This pref. I/O pair is connected by a single pre-enabled tri-state driver. • Pre-enabling is crucial. • Later check if flits correctly forwarded. • Thus, preferred paths are formed. • Reconfigurable at run-time. • Custom routes (shapes) allowed. CSD - UOC, Heraklion, Greece

  13. Opportunities for (Re-)configurability Uniform allocation of memory blocks to processors Non-uniform allocation of memory blocks to processors CSD - UOC, Heraklion, Greece

  14. Switch Architecture - Output Config. & arbitration logic. Stores pref. path config. & arbitrates. 400 ps 1 cycle Pref. path pre-enabled tri-states. Routing logic tri-state. Input FIFOs. Selectable when non-empty, or flit to be enqueued. CSD - UOC, Heraklion, Greece

  15. Switch Architecture - Input • Dead flits: Incorrectly eagerly forwarded. • Terminated at end of preferred path. • Switch resembles a buffered crossbar. Decides if flit needs to be enqueued. CSD - UOC, Heraklion, Greece

  16. Input Queueing Suboptimal • Dead flits enqueued in FIFOs. • Impact non-preferred flits. • Wasted power. • VCs would help. CSD - UOC, Heraklion, Greece

  17. Routing Algorithm • Deterministic routing employed. • Non-preferred paths follow XY routing. • We slightly modify XY routing to handle preferred paths: • Flit correctly eagerly forwarded if it approaches the destination in any axis. • Flit considered dead otherwise. CSD - UOC, Heraklion, Greece

  18. Routing Characteristics • Flits in preferred paths may not follow XY routing. • Duplicate copies of a flit may be delivered. • XY routing. • Pref. paths. D S CSD - UOC, Heraklion, Greece

  19. Routing Characteristics • Out-of-order delivery is disallowed. • By applying new configuration at a “safe” time. • XY routing. • Pref. paths. D S CSD - UOC, Heraklion, Greece

  20. Adaptive Routing • Many benefits to offer. • Extra challenging: dead flit classification. • An output may switch to “adaptive mode”. • According to application-dictated factors. • Then routes according to the adaptive algorithm. • Not previously-decided packets. • Flits routed this way are not dead. CSD - UOC, Heraklion, Greece

  21. Deadlock-Freedom • XY routing deadlock-free. • What we added to it: • Preferred paths. • Provide constraints to prevent circles. • Networks remains functional in any case. • Adaptive routing. • Depends on exact algorithm. CSD - UOC, Heraklion, Greece

  22. RAM Blocks • 4096 x 32 chosen to balance latency & area efficiency: • Larger blocks were disproportionally slower. • Smaller blocks imposed greater area overhead. • SP area efficient. • TP 75% and DP 100% larger. CSD - UOC, Heraklion, Greece

  23. Simple 2D Mesh Topology • 5x5 switches. • One for each PE. • Empty space between PEs. • Equally far away from A/D pins. • Does not take advantage of environment. CSD - UOC, Heraklion, Greece

  24. 2D Mesh with 2 Subnetworks • Divide the network. • Request, reply? • X, Y axis? • Switches in front of pins. • Need to be interconnected. • Still 1 switch/PE. CSD - UOC, Heraklion, Greece

  25. Rotated RAM Blocks • Switches every two X axes. • Half the number! • Switches slightly larger. • Our final topology an optimization of this. CSD - UOC, Heraklion, Greece

  26. NoC Topology – Bar Floorplan • Each switch is 6x6 and serves 4 PEs. CSD - UOC, Heraklion, Greece

  27. Bar Floorplan • Would be 8x12: • Vertical links drive address inputs. • 2 PE data ports served by 1 switch port. CSD - UOC, Heraklion, Greece

  28. Cross Floorplan CSD - UOC, Heraklion, Greece

  29. Switch P&R Results • 130 nm implementation library. Typical case. • Pref. path latency: • 300-420 ps. • 450-500 ps (incl. 1mm). • 1 cycle/node otherwise. • Past work: 1 cycle/node at 500 MHz. CSD - UOC, Heraklion, Greece

  30. Future Work • Synchronization issues – A flit may arrive at any time. • Impose preferred path constraints. • Implement switch asynchronously. • Evaluation in complete system. • Implement fault-tolerance. CSD - UOC, Heraklion, Greece

  31. Conclusion • We approach ideal latency. • By pre-enabled tri-state paths. • Our NoC is a generalized “mad-postman” [C. R. Jesshope et al, 1989]. • Our NoC is easily generalized – topology may need to be changed. • Past NoC research can be applied for further optimizations. CSD - UOC, Heraklion, Greece

More Related