1 / 20

Statistical Analysis of Packet Buffer Architectures

Statistical Analysis of Packet Buffer Architectures. Gireesh Shrimali, Isaac Keslassy, Nick McKeown E-mail: gireesh@stanford.edu. Packet Buffering. Line rate, R. Line rate, R. Memory. Memory. 1. 1. Scheduler.

fritzi
Download Presentation

Statistical Analysis of Packet Buffer Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown E-mail: gireesh@stanford.edu

  2. Packet Buffering Line rate, R Line rate, R Memory Memory 1 1 Scheduler • Big: For TCP to work well, the buffers need to hold one RTT (about 0.25s) of data. • Fast: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive (depart). Memory Line rate, R Line rate, R N N Scheduler Scheduler Input or Output Line Card Shared Memory Buffer

  3. Write Rate, R One 40B packet every 8ns Read Rate, R Scheduler requests causes random access One 40B packet every 8ns An ExamplePacket buffers for a 40Gb/s line card 10Gbits Problem is solved if a memory can be (random) accessed every 4 ns and store 10Gb of data Buffer Memory Buffer Manager

  4. Key Question How can we design high speed packet buffers from commodity available memories?

  5. Available Memory Technology • Use SRAM? +Fast enough random access time, but • Too low density to store 10Gbits of data. • Use DRAM? +High density means we can store data, but • Can’t meet random access time.

  6. Read Rate, R One 40B packet every 8ns Can’t we just use lots of DRAMs in parallel? Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory 40B 320B 320B 320B Write Rate, R Buffer Manager One 40B packet every 8ns Scheduler Requests

  7. 320B 320B 320B 320B 320B 320B 320B 320B 320B 320B 40B 40B 40B 40B 40B 40B 40B 40B Works fine if there is only one FIFO queue 40B 320B 320B 320B Write Rate, R Read Rate, R Buffer Manager(on chip SRAM) 40B 40B 320B 320B One 40B packet every 8ns One 40B packet every 8ns Scheduler Requests Aggregate 320B for the queue in fast SRAM and read and write to all DRAMs in parallel

  8. In practice, buffer holds many FIFOs 1 320B 320B 320B 320B • e.g. • In an IP Router, Q might be 200. • In an ATM switch, Q might be 106. We don’t know which head of line packet the scheduler will request next? 2 320B 320B 320B 320B Q 320B 320B 320B 320B 40B 320B 320B 320B Write Rate, R Read Rate, R Buffer Manager(on chip SRAM) ?B ?B 320B 320B One 40B packet every 8ns One 40B packet every 8ns Scheduler Requests

  9. 1 2 Q 1 1 55 60 59 58 57 56 1 4 3 2 2 2 97 96 2 1 4 3 5 Q Q 87 88 91 90 89 6 5 4 3 2 1 Small tail SRAM Small head SRAM cache for FIFO tails cache for FIFO heads Parallel Packet BufferHybrid Memory Hierarchy Large DRAM memory holds the body of FIFOs 54 53 52 51 50 10 9 8 7 6 5 95 94 93 92 91 90 89 88 87 86 15 14 13 12 11 10 9 8 7 6 86 85 84 83 82 11 10 9 8 7 DRAM B = degree of parallelism Writing B cells Reading B cells Buffer Manager Arriving Departing Packets Packets R R (ASIC with on chip SRAM) Scheduler Requests

  10. Objective • Would like to Minimize the size of SRAM while providing reasonable guarantees • So, ask the following question If the designer is willing to tolerate a certain drop probability then how small can the SRAM get?

  11. Memory Management Algorithm • Algorithm: At every service opportunity serve a FIFO from the set of FIFOs with occupancy greater than or equal to B • B-work conserving - thus minimizes SRAM size • Round-robin performs as well as largest FIFO first • Some definitions • FIFO occupancy counter: L(i,t) • Sum of occupancies: L(t)

  12. Model A(1,t) L(t) A(t) D(t) A(Q,t) • Model SRAM as a queue • Arrival process A(t) superposition of Q sources A(i,t) with rates • Deterministic service at rate 1 • Queue is stable, i.e., • Approach: assume A(i,t) are independent of each other • Step 1: Analyze for IID sources • Step 2: Show that the IID case is the worst case • Tools used • Analysis in continuous time domain • Use

  13. Fixed Batch Decomposition L(t) R(1,t) A(1,t) B*MA(1,t) B*ML(t) B*MD(t) Arrivals Departures R(Q,t) A(Q,t) B*MA(Q,t) R(t) Remainder Workload Quotient Workload

  14. Assumptions A(i,t) are • independent of each other • stationary and ergodic • simple point processes

  15. PDF of SRAM Occupancy • Theorem: The quotient workload and the remainder workload are independent of each other • Thus The distribution of SRAM occupancyis the convolution of the distributions of the quotientand remainder workloads

  16. PDF of Remainder Workload • Theorem: For large Q, PDF of remainder workload approaches a Gaussian distribution with mean Q(B- 1)/2 & variance Q(B^2-1)/12 • Intuition: Application of central limit theorem

  17. PDF of Quotient Workload • Theorem [Cao, Ramanan INFOCOM 2002]: For large Q, the behavior of the quotient FIFO approaches the behavior of an M/D/1 queue with the same load • Numerical solution through recurrence relations • Depends only on load • Independent of Q and B • Close to impulse at low loads

  18. PDF of Buffer Occupancy • Q = 1024; B = 4; Q(B-1)/2 = 1536

  19. Simulations (load=0.9) • Complementary CDF for Q = 1024; B = 4; load = 0.9 • Theory upper bounds simulations

  20. Conclusions • Established exact bounds relating the drop probability to the SRAM size • Model may be applicable to many queueing systems with batch service • Compared to deterministic guarantees ([Iyer, McKeown HPSR 2001]), an improvement by at most a factor of two • O(QB) a hard lower bound for this architecture

More Related