1 / 50

CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees. Zefu Dai, Mark Jarvin and Jianwen Zhu. University of Toronto. Background. Consumer Electronics is part of everyday life!. SoC. Mem Contr. DRAM. Background. A portable media player SoC example. Background.

alika-tyler
Download Presentation

CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees Zefu Dai, Mark Jarvin and Jianwen Zhu University of Toronto

  2. Background • Consumer Electronics is part of everyday life! SoC Mem Contr. DRAM University of Toronto

  3. Background • A portable media player SoC example University of Toronto

  4. Background • A portable media player SoC example University of Toronto

  5. Background • A portable media player SoC example 1.2 6.4 9.6 164.8 0.09 31.0 156.7 94 MB/s University of Toronto

  6. Background • A portable media player SoC example 1.2 6.4 9.6 164.8 0.09 31.0 156.7 94 MB/s 1000x University of Toronto

  7. Background Give me 10 KB in 1 us, please. • A portable media player SoC example 1.2 6.4 9.6 164.8 0.09 31.0 156.7 94 MB/s University of Toronto

  8. Background Give me 10 KB in 1 us, please. • A portable media player SoC example I want the data NOW!!! 1.2 6.4 9.6 164.8 0.09 31.0 156.7 94 MB/s University of Toronto

  9. Background Give me 10 KB in 1 us, please. • A portable media player SoC example I want the data NOW!!! 1.2 6.4 9.6 164.8 0.09 31.0 156.7 94 MB/s I can only supply a maximum of 6.4 GB every second. University of Toronto

  10. Challenges • Simultaneously satisfy: • Bandwidth requirements • Latency requirements University of Toronto

  11. Previous Work • QoS aware • Bandwidth or latency is heuristically improved • QoS guaranteed • Guaranteed minimum bandwidth and / or latency University of Toronto

  12. Main Ideas • Start with Bandwidth Guaranteed Prioritized Queuing (BGPQ) algorithm • Bandwidth guarantee • Improve it using Credit Borrow and Repay (CBR) mechanism • Minimum latency guarantee University of Toronto

  13. Bandwidth Guaranteed Prioritized Queuing • Combine both the benefits of the Priority Queuing and Weighted Fair Queuing • Credit based Weighted Fair Queuing • Prioritized service for residual bandwidth allocation • Residual bandwidth: • The bandwidth assigned to one user that is unused at a specific point of time University of Toronto

  14. BGPQ Algorithm • Case 1: all queues are busy • No residual bandwidth • Act as WFQ BGPQ Scheduler Initial state: everybody has a credit of zero. 0.0 0.0 0.0 0 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto

  15. BGPQ Algorithm • Case 1: all queues are busy • No residual bandwidth • Act as WFQ BGPQ Scheduler Step 1: calculate dynamic credit for each queue. 0.5 0.3 0.2 0 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto

  16. BGPQ Algorithm • Case 1: all queues are busy • No residual bandwidth • Act as WFQ BGPQ Scheduler Step 2: turn on switch box and transfer data from granted queue. 0.5 0.3 0.2 0 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto

  17. BGPQ Algorithm • Case 1: all queues are busy • No residual bandwidth • Act as WFQ BGPQ Scheduler Step 3: subtract 1 from the credit of granted queue. 0.3 0.2 0 -0.5 One Scheduling cycle is Done!! Sum of credits = 0! Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto

  18. BGPQ Algorithm • Case 2: some queues are empty • Has residual bandwidth • Prioritized service on residual bandwidth BGPQ Scheduler Before new scheduling cycle: Q1 is empty. 0.3 0.2 0 -0.5 Priority: Q0>Q1>Q2 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto

  19. BGPQ Algorithm • Case 2: some queues are empty • Has residual bandwidth • Prioritized service on residual bandwidth BGPQ Scheduler Step 1: Calculate a dynamic credit for each queue. Credit of empty queue remain unchanged 0.6 0.0 0.2 0 Priority: Q0>Q1>Q2 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto

  20. BGPQ Algorithm • Case 2: some queues are empty • Has residual bandwidth • Prioritized service on residual bandwidth BGPQ Scheduler Step 2: allocate residual bandwidth to non-empty queue with highest priority. 0.6 0.2 0.2 0 Priority: Q0>Q1>Q2 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto

  21. BGPQ Algorithm • Case 2: some queues are empty • Has residual bandwidth • Prioritized service on residual bandwidth BGPQ Scheduler Step 3: transfer data from granted queue. 0.6 0.2 0.2 0 Priority: Q0>Q1>Q2 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto

  22. BGPQ Algorithm • Case 2: some queues are empty • Has residual bandwidth • Prioritized service on residual bandwidth BGPQ Scheduler Step 4: subtract 1 from the credit of granted queue. 0.2 0.2 0 -0.4 Priority: Q0>Q1>Q2 One Scheduling cycle is Done!! Sum of credits = 0! Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto

  23. BGPQ Advantages • BGPQ = WFQ + PQ • bandwidth guarantee • prioritized access to residual bandwidth • Low implementation cost: • 3 adders for credit calculation • 1 comparator tree to find the highest dynamic credit University of Toronto

  24. BGPQ Disadvantage • Low latency, low bandwidth requirement class: • No minimum latency guarantee • Minimum latency: • No need to wait for any request that has lower priority University of Toronto

  25. Latency Problem of BGPQ • Example: • Optimal Scheduling: University of Toronto

  26. Credit Borrow and Repay Mechanism • Borrow • Allow low latency requirement class to borrow the scheduling opportunity from other classes • Repay • Return the credit later when convenient University of Toronto

  27. CBR Mechanism • Case 3: Credit Borrow and Repay • Maintain a debt queue for Q0: a borrowed ID FIFO CBR Scheduler 0.7 0.0 0.3 Step 1: calculate dynamic credit, and allocate the residual bandwidth 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto

  28. CBR Mechanism • Case 3: Credit Borrow and Repay • Maintain a debt queue for Q0 CBR Scheduler 0.7 0.0 0.3 Step 2: re-assign the scheduling opportunity to Q0. And record the borrowed ID. 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto

  29. CBR Mechanism • Case 3: Credit Borrow and Repay • Maintain a debt queue for Q0 CBR Scheduler 0.7 0.0 0.3 Step 3: transfer data 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto

  30. CBR Mechanism • Case 3: Credit Borrow • Maintain a debt queue for Q0 CBR Scheduler 0.0 0.3 Step 4: subtract 1 from original scheduled queue. 0 -0.3 DebtQ Priority: Q0>Q1>Q2 One Scheduling cycle is Done!! Sum of credits = 0! Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto

  31. CBR Mechanism • Case 4: Credit Repay • It is time to repay the credit CBR Scheduler 0.0 0.3 Initial state: Q0 is empty but has debt. It will ‘appear’ to be non-empty 0 -0.3 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto

  32. CBR Mechanism • Case 4: Credit Repay • It is time to repay the credit CBR Scheduler 0.6 0.0 0.4 Step 1: calculate dynamic credits and allocate the residual bandwidth. 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto

  33. CBR Mechanism • Case 4: Credit Repay • It is time to repay the credit CBR Scheduler 0.6 0.0 0.4 Step 2: return the scheduling opportunity and clear the DebtQ. 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto

  34. CBR Mechanism • Case 4: Credit Repay • It is time to repay the credit CBR Scheduler 0.6 0.0 0.4 Step 3: transfer data. 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto

  35. CBR Mechanism • Case 4: Credit Repay • It is time to repay the credit CBR Scheduler 0.0 0.4 Step 4: subtract 1 from scheduled queue. 0 -0.4 DebtQ Priority: Q0>Q1>Q2 One Scheduling cycle is Done!! Sum of credits = 0! Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto

  36. CBR Mechanism • Minimum Latency Guarantee using CBR • No need to wait for requests in other queues • Worst case: Q0 is not empty while DebtQ is full • No minimum latency guarantee under such case University of Toronto

  37. Implementation in FPGA • CBR MPMC top level diagram • Instantiation-time configurable port number • Run-time programmable priority and bandwidth University of Toronto

  38. Implementation in FPGA Credit calculation circuit Sorting Network and CBR University of Toronto

  39. Implementation Cost • 8 port CBR-MPMC with 16-depth DebtQ • Xilinx Virtex-5 XC5VLX50T • Speedy DDR backend memory controller University of Toronto

  40. Evaluation • Simulation Framework • Cycle accurate C model of MPMC • Simple close-page DDR memory model • Trace capturing and converting method University of Toronto

  41. Evaluation • CPU workload trace file (from B. Jacob) • Cache simulation on standard SPEC2000 integer benchmark Irregular and low bandwidth requirement: 0.4 memory transactions per 1k instructions. University of Toronto

  42. Evaluation • Accelerator Workload • ALPBench suite of parallel multimedia applications University of Toronto

  43. Evaluation • Accelerator Workload • ALPBench suite of parallel multimedia applications Periodically repeated access pattern, high bandwidth requirement: 18.3 memory transactions per 1k instructions. University of Toronto

  44. Results • BGPQ Scheduler • Latency: number of clock cycles • Bandwidth: number of memory transaction per 1k clock cycles University of Toronto

  45. Results • CBR Scheduler with a 16-depth debtQ University of Toronto

  46. Impact of DebtQ Size • Repay conditions: • DebtQ is full • Q0 is empty CBR Scheduler 0.6 0.0 0.4 When DebtQ is full, remaining requests in Q0 will not be served with minimum latency guarantee! 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto

  47. Impact of DebtQ Size • How big is enough for DebtQ? • Determined by instant time bandwidth requirement • Irregular access pattern means: • Large range of DebtQ size requirement • Tradeoff • Resource efficiency VS performance University of Toronto

  48. Results • Impact of debt queue size University of Toronto

  49. Conclusions • CBR scheduler can provide minimum bandwidth and latency guarantees • Low implementation cost, power consumption • We expect its successful use in a wide range of multimedia applications University of Toronto

  50. Questions? CBR Scheduler 0.0 0.3 0 -0.3 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto

More Related