1 / 27

Techniques for Fast Packet Buffers

Techniques for Fast Packet Buffers. Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University. Problem Statement. Motivation : To design an extremely high speed packet buffer. Read Rate R.

sybils
Download Presentation

Techniques for Fast Packet Buffers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University

  2. Problem Statement Motivation: To design an extremely high speed packet buffer. Stanford University

  3. Read Rate R Write Rate R 2 1 3 2 1 cell in 12.8 ns 1 cell in 12.8 ns Problem Statement ….. 1 Buffer Memory OC768 =40Gb/sec 64 byte cells 1 How to design a buffer with an access time of 6.4ns ? Stanford University

  4. Problem Statement ….. 2 R = 40Gb/sec RTT = 0.25s RTT * R = 10Gbit Buffer Memory …… 9 8 7 6 5 4 3 2 1 How to create a buffer of size 10Gbit? Stanford University

  5. Buffer Architecture - Demand and Supply • Buffer Architecture requires • Fast access time, large size • SRAM • Fast access time, small size (low density) • DRAM • Slow access time, large size (high density) Stanford University

  6. Problem Statement Redefined Motivation: To design an extremely high speed packet buffer architecture with fast access time and large size. This talk: Is about the analysis of one such well known approach. Stanford University

  7. Some Thoughts • We believe that this architecture and many such equivalent designs already exist in many router line cards • These results may already be known and might exist in proprietary form • One would like to be able to give deterministic guarantees in the architecture Stanford University

  8. Characteristics of Packet Buffer Architectures • The total throughput needed is at least 2(Ingress Rate) • Size of Buffer is at least R * RTT • The buffers have one or more FIFOs • The sequence in which the FIFOs are accessed is determined by an arbiter and is unknown apriori Stanford University

  9. Memory Hierarchy of Packet Buffer Large DRAM memory with access time T’ 1 Q b cells Write Access Read Access Time = T= 2T’ Time = T = 2T’ Memory Management Algorithm b cells b cells 1 1 Arriving Departing Packets Packets Arbiter R R Q Q grants Ingress SRAM Egress SRAM cache of FIFO tails cache of FIFO heads Stanford University

  10. System Design Parameters Main Parameters • SRAM Size • Latency faced by a cell System Parameters • I/O Bandwidth • Number of addresses • Use single address on every DRAM • Use different addresses on every DRAM • Use/Non Use of DRAM Burst Mode • (non) Existence of Bank conflicts Stanford University

  11. Packet Buffer Design DRAMs ......... . b cells Memory Management Algorithm Egress SRAM Buffer Ingress SRAM Buffer SRAM Buffer Area = A R R Number of Queues = Q C Stanford University

  12. Today’s Talk… Optimize Main Parameters • Minimize latency at cost of SRAM size • …….. (later) Minimize SRAM size at cost of Latency Assumptions on system parameters • No speedup on I/O • I/O = 2R • Simple address architecture • Use single address from every DRAM Stanford University

  13. More Assumptions .. • We shall assume that we have only cells of size “C” which arrive in the system • No use of DRAM Burst Mode • No bank conflicts Stanford University

  14. Symmetry Argument • The analysis and working of the ingress and egress buffer architectures are similar • We shall analyze only the egress buffer architecture Stanford University

  15. System Parameters in Packet Buffer Design • Access Time of a DRAM = T’ = 50ns • DRAM Access Time as seenby the E(In)gress = T = 2T’ • Cell Time of System = Ts = 6.4ns • Cell Time of E(In)gress = Tc = 12.8ns • Min. width of DRAMs = T/Tc = b Stanford University

  16. Packet Buffer DesignQuestions • Can we give deterministic guarantees? • Why not keep all cells in the DRAM? • Does not an SRAM of size little more than qb suffice? Stanford University

  17. w A Bad Case for the Queues …1 t = 0 t = 1 t = 2 t = 3 t = 4 t = 5 t = 6 t = 7 Stanford University

  18. A Bad Case for the Queues … 2 t = 8 t = 9 t = 10 t = 11 t = 12 t = 13 t = 14 … t = 17 Stanford University

  19. Observation • There exists some value of “w” for which the buffer does not overflow • w = qb is one such sufficient value • Threshold value “Ti” governs “w”. Q b -1 Ti w Stanford University

  20. Definitions • Occupancy • This is the number of cells in the SRAM for a particular queue • Active Queue • An active queue is one which has cells in the DRAM present for it Stanford University

  21. One More Definition  • Deficit • This is defined as the difference between the threshold ‘T’ and the occupancy of an active queue. • For a queue which is not active the deficit is zero Ti b -1 deficit occupancy Stanford University

  22. Can we Bound the Maximum Value of the Deficit? • Define f(i,q) • The maximum deficit that a set of “i” queues can have in a system of “q” queues • We are interested in f(1,q) • f(q,q) < qb …. trivially Stanford University

  23. Largest Deficit Queue First Recurrence Equations • f(2,q) >= f(1,q) –b + [f(1,q) –b] • f(3,q) >= f(2,q) –b + [f(2,q) –b]/2 • f(4,q) >= f(3,q) –b + [f(3,q) –b]/3 • …… • f(q,q) >= f(q-1,q) –b + [f(q-1,q) –b]/(q-1) Stanford University

  24. Dirty Math.. • qb > f(q,q) … trivially >= [f(q-1,q) –b] + [f(q-1,q) –b]/(q-1) >= f(q-1,q)(q/q-1) – b(q/q-1) >= {f(q-2,q)(q-1/q-2) –b(q-1/q-2)}(q/q-1) – b[q/q-1] >= f(q-2,q)q/q-2 –bq/q-2 –bq/q-1 >= f(q-3,q)q/q-3 –bq/q-3 –bq/q-2 - bq/q-1 ….. >= f(1,q) q/1 – bq sigma [1/i] • This gives, f(1,q) <= b[1 + ln q] Stanford University

  25. Results • If the MMA services the queue, • with the largest deficit & • has a simple address architecture • and no I/O speedup • then • A latency of zero can be guaranteed when the • width of the SRAM is b[1 + lnq] + b = b [2 + ln q] • And the size of SRAM is [2 + lnq]Qb • Necessity vs. Sufficiency? Stanford University

  26. A Dose of Reality • Typical values • “b” is typically <= 10 • Q = Np, where • N = # of ports (for VOQ) • p = number of classes per port • Implementations • VOQ • N = 32, p = 1, Q = 25, b = 23, SRAM = 700 kb • Diffserv • N = 32, p = 16, Q = 29, b = 23, SRAM = 17 Mb • Intserv • Lets not think about it! Stanford University

  27. Future Work • Discussion on trading off latency for SRAM size • Analysis of other parameters • Relaxing I/O, address constraints • Implementation Pain • …. Still a long way to go Stanford University

More Related