1 / 17

Virtual-Channel Flow Control William J. Dally

Virtual-Channel Flow Control William J. Dally. Presented by: Nick Kirchem March 5, 2004. Motivation. Interconnection network is critical Performance sensitive to network latency & throughput Interconnect = large fraction of cost and power consumption

Download Presentation

Virtual-Channel Flow Control William J. Dally

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtual-Channel Flow ControlWilliam J. Dally Presented by: Nick Kirchem March 5, 2004

  2. Motivation • Interconnection network is critical • Performance sensitive to network latency & throughput • Interconnect = large fraction of cost and power consumption • Interconnect throughput is limited to a fraction of capacity due to coupled resource allocation • Single buffers associated with physical channels • Blocks entire physical channel • True for circuit switching & wormhole routing

  3. Solution: Virtual Channels • Add “lanes” for each physical channel • (lane = virtual channel)

  4. VCs and Flow Control Background • Virtual Channels decouple physical channels from buffer memory • The most costly resources of interconnect’n network • Associate multiple virtual channels with single physical channel • Paper analyzes Flow Control • Determines how resources are allocated, and • How collisions over resources are resolved • Most beneficial to flow control strategies that block

  5. Virtual Channels Structure • Each node contains set of buffers and a switch

  6. Virtual Channels Structure • Organize flit buffers into several lanes

  7. Virtual Channels State Logic • Status Register for Transmitting Node • Lane-is-free bit • Number of free flit buffers in lane • (Optionally) Priority of packet in lane • Status Register for Receiving Node • Input & Output pointers for each lane buffer • Channel state (free, waiting, active)

  8. Virtual Channels State Logic

  9. VC State Logic Storage Overhead • Number of bits of storage required for l lanes, b flit buffers, and pri priority bits: • Typical scenario (b=16, l=4, pri=0) requires: • 36 bits of overhead with virtual channels • 17 bits with no virtual channels • Small compared to total storage of 512 bits

  10. VC Operation • Packet arrives at node • Assigned output channel by routing algorithm • Based on destination and output channel status • Assigned to any free virtual channel (lane) • Blocks if none are available • Flit advanced by flow control • Must gain access to a path through switch, and • Access to the physical channel to input of next node • Lane is deallocated when last flit leaves node

  11. Allocation Policies • Allocate physical channel bandwidth for lanes that: • Have flit ready to transmit • Have room for flit at receiving end • Can use any arbitration algorithm • Random, round-robin, priority • Deadline scheduling (schedule by age)

  12. VC Implementation Issues • Integration design changes • Replace FIFO buffers with multilane buffers • Modify switch for larger # of inputs and outputs • Flow control protocol modification • Switch Complexity • Added complexity to ACK when free buffer space opens up (identify lane = additional bits)

  13. Virtual Channel Analysis • Some assumptions: • Packet destinations uniformly randomly distributed • Arriving packet is consumed without waiting • Single flit buffer for each lane • Packet blocking probabilities are independent • Lots of Math…

  14. VC Analysis Results

  15. Experimental Results • Simulator (C Program) • Various topologies and VC depth • Throughput and Latency Analysis match predicted performance • Better to have more lanes with less depth than vice versa • Scheduling Algorithms show possibilities of performance given priorities or deadlines

  16. Experimental Results

  17. Conclusion and Questions • Network throughput and latency improved by decoupling physical channels from buffers • Is it worth the added complexity? • Under which systems/network topologies would it be useful? Where would it not be so useful?

More Related