1 / 28

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks. Authors: Daniel U. Becker , Nan Jiang, George Michelogiannakis , William J. Dally Stanford University Presenter: Han Liu University of California, San Diego. Background. NoCs become huge

phil
Download Presentation

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Backpressure:Efficient Buffer Management for On-Chip Networks Authors: Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford University Presenter: Han Liu University of California, San Diego

  2. Background • NoCs become huge • Hundreds of cores on a single die • Currently using: Input-queued routers • Input buffer resources become significant • Input buffer sharing is attractive in NoCs • Pros: Improves area and power efficiency • Cons: facilitates spread of congestion Han Liu

  3. Overview • Adaptive Backpressure mitigates performance degradation by avoiding unproductive use of buffer space in the presence of congestion • Avoid downsides of buffer sharing while maintaining benefits in benign case Han Liu

  4. Motivation • Assumption: buffers are good • More flexible routing • Helps traffic waiting closer to the destination • Is this always true? • Energy, area efficiency • Implementation difficulty Han Liu

  5. Train Example Boston (Destination) Denver (buffer) San Diego (Source) Buffers are good Han Liu

  6. Motivation • Static buffer vsDynamic buffer management VC1 Static VC2 Wasted buffer VC1 Dynamic VC2 Han Liu

  7. Dynamic Buffer Management • Buffer space is expensive resource in NoCs • 30-35% network power (MIT RAW, UT TRIPS) • Dynamic management increases utilization by sharing buffer space among multiple VCs • Optimize use of expensive buffer resources • Decrease incremental cost of VCs • Improved area and power efficiency • 25% more throughput or 34% less power [Nicopoulos’06] Han Liu

  8. Sharing • Pros • Economic • Efficient • Cons • Inconvenient • Trouble Han Liu

  9. Boarder Example US HWY5 HWY805 Mexico Han Liu

  10. Buffer Monopolization • Blocked flits from congested VC accumulate in buffer • Effective buffer size reduced for other VCs • Performance degradation (latency / throughput) • Congestion spreads across VCs (flows / apps / VMs / …) VC 0 VC 1 Han Liu

  11. Adaptive Backpressure Goal: • Avoid unproductive use of buffer space in dynamic buffer management • But allow sharing when beneficial Approach: • Match arrival and departure rate for each VC by regulating credit availability (backpressure) • Derive quota from credit round trip times Han Liu

  12. Buffer Monopolization • Want a way to regulate unlimited credits supply to congested VC1 • Give VC0 more credits and buffer space VC 0 VC 1 Han Liu

  13. Quota Motivation (1) Router 0 Router 1 Router 0 Router 1 Tcrt,0 Idle cycle Credit stall time Without congestion, full throughput requires Tcrt,0 credits Insufficient credit supply causes idle cycle downstream Han Liu

  14. Quota Motivation (2) Router 0 Router 1 Router 0 Router 1 Tcrt,0+Tstall Congestion stall Congestion stall Queuing stall Queuing stall Queuing stall Queuing stall Queuing stall Queuing stall Credit stall Excess flits Excess flits Excess drained time Congestion stall causes unproductive buffer occupancy Matching stalls avoids unproductive buffer occupancy Han Liu

  15. Quota Algorithm • VC’s quota value = Throughput * RRTmin • Throughput of upstream router is hard to measure • -> Compute quota values based on observefdRTTfor individual credits Han Liu

  16. Quota Heuristic • Track credit RTT for each output VC • RTT=RTTmin⇒ set quota to RTTmin • No downstream congestion • Allow one flit in each cycle of RTT interval • RTT>RTTmin⇒ subtract difference from RTTmin • Each congestion and queuing stall adds to RTT • Allow one credit stall per downstream stall Han Liu

  17. Quota Equation • Q = max(Tcrt,base - (Tcrt,obs- Tcrt,base), 1 ) = max(2 * Tcrt,base - Tcrt,obs , 1) • When Tcrt,obs is large, Q is small • Qmin = 1 in order to guarantee that quota values can continue to be updated Han Liu

  18. Implementation • Network design determines RTTmin for each link • Track RTT for single in-flight credit per VC • Update quota value upon return • Switch allocator masks all VCs that exceed quota • Simple extension to existing flow control logic • No additional signaling required • < 5% overhead for 16x64b buffer with 4 VCs Han Liu

  19. Evaluation Methodology • BookSim 2.0 • 8x8 2D mesh, 64-bit channels, DOR • 16-slot input buffers, 4 VCs • Combined VC and switch allocation • Synthetic traffic and application benchmarks • Compare ABP to unrestricted sharing Han Liu

  20. Network Stability (1) • For adversarial traffic, throughput in Mesh is unstable at high load • Traffic merging causes starvation • Tree saturation causes widespread congestion • ABP improves stability • Throttles sources that inject at very high rate • Efficient buffer use reduces tree saturation • Faster recovery from transient congestion Han Liu

  21. Network Stability (2) [tornado traffic] 6.3x Han Liu

  22. Network Stability (3) [foreground traffic at 50% injection rate] 3.3x -13% saturation rate Han Liu

  23. Performance Isolation (1) • Inject two classes of traffic into network • Shared buffer space, separate VCs • Sharing causes interference between classes (leads to latency problem) • ABP reduces interference • Contains effects of congestion within a class • Better isolation between workloads, VMs, … Han Liu

  24. Performance Isolation (2) [uniform random foreground traffic] -33% -38% [uniform random background traffic] [hotspot background traffic] Han Liu

  25. Performance Isolation (3) [50% uniform random background traffic] -31% w/o background Han Liu

  26. Application Performance -31% w/o background [12.5% injection rate for streaming traffic] Han Liu

  27. Conclusions • Sharing improves buffer utilization, but can lead to undesired interference effects • Adaptive Backpressureregulates credit flow to avoid unproductive use of shared buffer space • Mitigates performance degradation in presence of adversarial traffic • But maintains key benefits of buffer sharing under benign conditions Han Liu

  28. Question? Thank you for your attention! The End Han Liu

More Related