180 likes | 576 Views
Designing Packet Buffers for Internet Routers Friday, March 7, 2014 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm Contents Motivation A 100 Tb/s router 160 Gb/s packet buffer Theory
E N D
Designing Packet Buffers for Internet Routers Friday, March 7, 2014 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm
Contents • Motivation • A 100 Tb/s router • 160 Gb/s packet buffer • Theory • Generic Packet Buffer Problem • Optimal Memory Management • Implementation
Motivating Design: 100Tb/s Optical Router Optical Switch Electronic Linecard #1 Electronic Linecard #625 160-320Gb/s 160-320Gb/s 40Gb/s • Line termination • IP packet processing • Packet buffering • Line termination • IP packet processing • Packet buffering 40Gb/s 160Gb/s Arbitration 40Gb/s Request 40Gb/s Grant (100Tb/s = 625 * 160Gb/s)
Load Balanced SwitchThree stages on a linecard 1st stage 2nd stage 3rd stage R/N 1 1 1 R R R R R R 2 2 2 N N N Segmentation/ Frame Building Reassembly Main Buffering
Advantages • Load-balanced switch • 100% throughput • No switch scheduling • Hybrid Optical-Electrical Switch Fabric • Low (almost zero) power • Can use an optical mesh • No reconfiguration of internal switch (MEMS)
160 Gb/s Linecard 0.4 Gbit at 3.2 ns R Fixed-size Packets R Lookup/Processing Segmentation Load-balancing 1st Stage R 40 Gbit at 3.2 ns VOQs 1 R 2 N 2nd Stage Switching 3 rd stage R R Reassembly 0.4 Gbit at 3.2 ns
Contents • Motivation • A 100 Tb/s router • 160 Gb/s packet buffer • Theory • Generic Packet Buffer Problem • Optimal Memory Management • Implementation
Write Rate, R One 128B packet every 6.4ns Read Rate, R One 128B packet every 6.4ns Scheduler Requests Packet Buffering Problem Packet buffers for a 160Gb/s router linecard Problem is solved if a memory can be (random) accessed every 3.2ns and store 40Gb of data 40Gbits Buffer Memory Buffer Manager
Memory Technology • Use SRAM? +Fast enough random access time, but • Too low density to store 40Gbits of data. • Use DRAM? +High density means we can store data, but • Can’t meet random access time.
Read Rate, R One 128B packet every 6.4ns 0-127 128-255 … … … … … … … 1152-1279 Can’t we just use lots of DRAMs in parallel? Read/write 1280B every 32ns Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory 1280B 1280B Write Rate, R Buffer Manager One 128B packet every 6.4ns Scheduler Requests
1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 128B 128B 128B 128B 128B 128B 0-127 128-255 … … … … … … … 1152-1279 Works fine if there is only one FIFO Buffer Memory 1280B 1280B Write Rate, R Read Rate, R 128B Buffer Manager (on chip SRAM) 128B 1280B 1280B One 128B packet every 6.4ns One 128B packet every 6.4ns Scheduler Requests Aggregate 1280B for the queue in fast SRAM and read and write to all DRAMs in parallel
1280B 1280B Write Rate, R Read Rate, R Buffer Manager ?B ?B 320B 320B One 128B packet every 6.4ns One 128B packet every 6.4ns 0-127 128-255 … … … … … … … 1152-1279 In practice, buffer holds many FIFOs 1 1280B 1280B 1280B 1280B e.g. • In an IP Router, Q might be 200. • In an ATM switch, Q might be 106. How can we writemultiple packets into different queues? 2 1280B 1280B 1280B 1280B Q 1280B 1280B 1280B 1280B Scheduler Requests
1 2 Q 1 1 55 60 59 58 57 56 1 4 3 2 2 2 97 96 2 1 4 3 5 Q Q 87 88 91 90 89 6 5 4 3 2 1 Small tail SRAM Small head SRAM cache for FIFO tails cache for FIFO heads Parallel Packet Buffer Hybrid Memory Hierarchy Large DRAM memory holds the body of FIFOs 54 53 52 51 50 10 9 8 7 6 5 95 94 93 92 91 90 89 88 87 86 15 14 13 12 11 10 9 8 7 6 86 85 84 83 82 11 10 9 8 7 DRAM b = degree of parallelism Writing b bytes Reading b bytes Buffer Manager Arriving Departing Packets Packets R R (ASIC with on chip SRAM) Scheduler Requests
Examples: • 160Gb/s line card, b=1280, Q=625: SRAM = 52Mbits • 160Gb/s line card, b=1280, Q=625: SRAM =6.1Mbits, latency is 40ms. Problem • Problem: • What is the minimum size of the SRAM needed so that every packet is available immediately within a fixed latency? • Solutions: • Qb(2 +ln Q) bytes, for zero latency • Q(b – 1) bytes, for Q(b – 1) + 1 time slots latency.
Discussion Q=1000, b = 10 Queue Length for Zero Latency SRAM Size Queue Length for Maximum Latency Pipeline Latency, x
Contents • Motivation • A 100 Tb/s router • 160 Gb/s packet buffer • Theory • Generic Packet Buffer Problem • Optimal Memory Management • Implementation
Technology Assumptions in 2005 • DRAM Technology • Access Time ~ 40 ns • Size ~ 1 Gbits • Memory Bandwidth ~ 16 Gbps (16 data pins) • On-chip SRAM Technology • Access Time ~ 2.5 ns • Size ~ 64 Mbits • Serial Link Technology • Bandwidth ~ 10 Gb/s • 100 serial links per chip
DRAM DRAM DRAM Buffer Manager SRAM SRAM Packet Buffer Chip (x4)Details and Status R/4 R/4 • Incoming: 4x10 Gb/s • Outgoing: 4x10 Gb/s • 35 pins/DRAM x 10 DRAMs = 350 pins • SRAM Memory: 3.1 Mbits with 3.2ns SRAM • Implementation starts Fall 2003