1 / 26

Fast Buffer Memory with Deterministic Packet Departures

Fast Buffer Memory with Deterministic Packet Departures. Mayank Kabra, Siddhartha Saha, Bill Lin University of California, San Diego. Packet Buffer in Routers. Linecards. Incoming linecards have 40byte@40Gbps = 8ns to read and write a packet.

teague
Download Presentation

Fast Buffer Memory with Deterministic Packet Departures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Buffer Memory with Deterministic Packet Departures Mayank Kabra, Siddhartha Saha, Bill Lin University of California, San Diego

  2. Packet Buffer in Routers Linecards • Incoming linecards have 40byte@40Gbps = 8ns to read and write a packet. • The routers need to store the packets to deal with congestion. • Bandwidth X RTT = 40Gb/s*250ms = 1Gb buffer. • Too big to store in SRAM, hence need to use DRAM. • Problem: DRAM access time ~40ns. So, there is roughly 10x speed difference. In Router Core: Scheduler and Packet Buffers Out In Out In Out

  3. Parallel and Interleaved DRAM banks • Assume the speed difference is 3x P P P P P P SRAM DRAMs

  4. Problems with Parallelism • The access pattern can create problems. • If we try to access 3, 6, 9 and 11 one after another, it is possible to issue interleaved read requests and read those packets out at Line Speed. 1 2 3 7 6 5 4 8 9 12 11 10 13 14 DRAMs

  5. Problems with Parallelism • But, accessing 2 & 3or 10 & 11 in succession is problematic. • This is an example of a Bank Conflict 1 2 3 7 6 5 4 8 9 12 11 10 13 14 DRAMs

  6. Use The Packet Departure Time • Wide classes of routers (Crossbar Routers) where the packets departures are determined by the scheduler on the fly. • Packet buffers which cater to these routers exist but are complex • There are other high performance routers such as Switch-Memory-Switch, Load Balance Routers for which packet departure time can be calculated when the packet is inserted in the buffer. Solution Idea: We will use the known departure times of the packets to schedule them to different DRAM banks such that there won’t be any conflicts.

  7. Packet Buffer Abstraction • Fixed sized packets, time is slotted (Example: 40Gb/s, 40 byte packet => 8ns). • The buffer may contain arbitrary large number of logical queues, but with deterministic access. • Single-write Single-read time-deterministic packet buffer model.

  8. Packet Buffer Architecture • Interleaved memory architecture with multiple slower DRAM banks. • K slower DRAM banks. • b time slots to complete a single memory read or write operation. • b consecutive time slots is a frame. • A time slot t belongs to frame [t/b]

  9. Packet Buffer Operation 1 2 K-1 K DRAMs ... aggregate de-aggregate b packets … … arriving packets departing packets SRAM Bypass Buffer

  10. Packet Arrival [Frame 1] • Frame 1: • Assume b = 3 • Packets P1, P2 & P3 arrive in time slot 1, 2 and 3 respectively. • They are aggregated before writing to the DRAM. 1 2 P3 P2 P1 3 4 5 DRAMs

  11. Packet Arrival [Frame 2] • Frame 2: • Packets P1, P2 & P3 are being written to the DRAM banks (1, 2 & 3) during Frame 2. • New packets P4, P5, P6 comes, which are stored in the buffer. 1 2 P1 P6 P5 P4 P2 3 P3 4 5 DRAMs

  12. Packet Departure [Frame 19] • Packets P58, P59 & P60 are scheduled to depart at time slots 58, 59 and 60 respectively (frame 20). • They will be read from the DRAM banks one frame slot before their departure frame slot (frame 19) 1 P59 2 3 P60 4 P58 5 DRAMs

  13. Packet Departure [Frame 20] • Packets P58, P59 & P60 are read from the buffer and are output from the switch at time slot 58, 59 and 60 respectively. 1 P59 2 P60 3 P58 4 5 DRAMs

  14. SRAM Bypass Buffer • The operational model dictates that the minimum round trip latency to write and read a packet from one of the DRAM banks is 4 frames. • Thus, a packet with a departure time less than 4b-1 time slots away cannot be stored into DRAM. • A small amount of SRAM (size 4b) is used as a bypass buffer.

  15. Number of DRAM banks • Arrival Write Conflicts: At any current frame f, there can be at most b packets that will be written to the DRAM banks (including the current packet). P P P Hence, for each packet, there will be maximum of b-1 “Arrival Write Conflicts” DRAMs

  16. Number of DRAM banks • Arrival Read Conflicts: At any current frame f, there can be at most b packets that will be read from the DRAM banks. Those b banks will be busy in the current time frame and will be unavailable. P P P Hence, for each packet, there will be maximum of b “Arrival Read Conflicts” DRAMs

  17. Number of DRAM banks • Departure Read Conflicts: Any packet that is written in the current frame f, it will eventually need to be read in a future frame d for departure. At that future frame d, there are b-1 other departing packets. P Hence, for each packet, there will be maximum of b-1 “Departure Read Conflicts” P P DRAMs

  18. How Many DRAM Banks? • Total Conflicts: • Arrival Write: (b-1) • Arrival Read: b • Departure Read: (b-1) • Hence, total (3b-2) conflicts. • If the number of banks is more than (3b-2), we will always have a free bank for all the packets. P DRAMs

  19. DRAM Bank Selection • To find a compatible memory, maintain a two dimensional read-transaction bitmap R. • Each row corresponds to a frame slot. • Each column corresponds to a DRAM bank (hence 3b – 1 columns). • R(f, m) denotes whether mthDRAM bank has an already stored packet that must be read at the fth frame slot.

  20. DRAM Bank Selection • Write-reservation bitmap W of size (3b – 1) • W(m) denotes that in current frame, mthmemory bank has been assigned an arriving packet.

  21. DRAM Bank Selection Logic

  22. DRAM Bank Selection • Approach: Greedy solution avoiding the three types of conflicts. • To check if a memory bank is compatible for a packet p arriving at timeframe f, and having a departure timeframe d: • Check NOT(W(m) | R(f,m) | R(d, m)) • Instead of checking one memory bank at a time, we can check all of them at once: • V = NOT(W | R(f) | R(d)), where R(f) and R(d) are the row vectors. • From V, get the index of the first compatible memory. • If n is the bank selected for p, then set W(n) = 1 and R(d,n) = 1.

  23. Size of the Bitmap • Size of the packet buffer is T packets i.e., T is the farthest departure time slot relative to the current time slot. • Farthest departure frame: • Each row in the bitmap is (3b – 1) bits, then the size of the bitmap is: • Assuming a RTT of 250ms and a line rate of 40Gb/s, the packet buffer would correspond to a memory requirement of T = 3 x 107 packets, which makes the bitmap size close to 11MB.

  24. Additional Details • Location of a packet in the DRAM: • Once a bank has been selected, need a way to assign the actual memory location to write, and later, read the packet. • Determine the memory location based on the departure frame using a circular indexing to map a frame to a packet location in the memory. • How to reorder/de-aggregate the packets? • Store the timestamp in the DRAM with the packet.

  25. Conclusion • Developed a simple packet buffer architecture when the packet departure times are known e.g., Switch-Memory-Switch and Load-Balanced Routers. • Can support arbitrary large number of logical queues. • Number of DRAM banks and SRAM bypass buffer depend only on the physical parameters.

  26. Thank You

More Related