1 / 19

NetFPGA Project: 4-Port Layer 2/3 Switch

NetFPGA Project: 4-Port Layer 2/3 Switch. Ankur Singla (asingla@stanford.edu) Gene Juknevicius (genej@stanford.edu). Agenda. NetFPGA Development Board Project Introduction Design Analysis Bandwidth Analysis Top Level Architecture Data Path Design Overview Control Path Design Overview

gisela
Download Presentation

NetFPGA Project: 4-Port Layer 2/3 Switch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NetFPGA Project:4-Port Layer 2/3 Switch Ankur Singla (asingla@stanford.edu) Gene Juknevicius (genej@stanford.edu)

  2. Agenda • NetFPGA Development Board • Project Introduction • Design Analysis • Bandwidth Analysis • Top Level Architecture • Data Path Design Overview • Control Path Design Overview • Verification and Synthesis Update • Conclusion

  3. NetFPGA Development Board

  4. Project Introduction • 4 Port Layer-2/3 Output Queued Switch Design • Ethernet (Layer-2), IPv4, ICMP, and ARP • Programmable Routing Tables – Longest Prefix Match, Exact Match • Register support for Switch Fwd On/Off, Statistics, Queue Status, etc. • Layer-2 Broadcast, and limited Layer-3 Multicast support • Limited support for Access Control • Highly Modular Design for future expandability

  5. Bandwidth Analysis • Available Data Bandwidth • Memory bandwidth: 32 bits * 25 MHz = 800 Mbits/sec • CFPGA to Ingress FIFO/Control Block bandwidth:32 bits * 25 MHz / 4 = 200 Mbits/sec • Packet Queue to Egress bandwidth: 32 bits * 25 MHz / 4 = 200 Mbits/sec • Packet Processing Requirements • 4 ports operating at 10 Mbits/sec => 40 Mbits/sec • Minimum size packet 64 Byte => 512 bits • 512 bits / 40 Mbits/sec = 12.8 us • Internal clock is 25 MHz • 12.8 us * 25 MHz = 320 clocks to process one packet

  6. Top Level Architecture

  7. Data Flow Diagram • Output Queued Shared Memory Switch • Round Robin Scheduling • Packet Processing Engine provides L2/L3 functionality • Coarse Pipelined Arch. at the Block Level

  8. Master Arbiter • Round Robin Scheduling of service to Each Input and Output • Interfaces Rest of the Design with Control FPGA • Co-ordinates activities of all high level blocks • Maintains Queue Status for each Output

  9. Ingress FIFO Control Block • Interfaces three blocks • Control FPGA • Forwarding Engine • Packet Buffer Controller • Dual Packet Memories for coarse pipelining • Responsible for Packet Replication for Broadcast

  10. Packet Processing Engine Overview • Goals • Features – L3/L2/ICMP/ARP Processing • Performance Requirements – 78Kpps • Fit within 60% of Single User FPGA Block • Modularity / Scalability • Verification / Design Ease • Actual • Support for all required features + L2 broadcast, L3 multicast, LPM, Statistics and Policing (coarse access control) • Performance Achieved – 234Kpps (worst case 69Kpps for ICMP echo requests 1500bytes) • Requires only 12% of Single UFPGA resources • Highly Modular Design for design/verification/scalability ease

  11. First Level Parsing Statistics and Policing ARP Processing L3 Processing ICMP Processing L2 Processing Forwarding Master State Machine Pkt Processing Engine Block Diagram From CFPGA Packet Memory0 Native Packet Packet Memory1 To Packet Buffer

  12. Forwarding Master State Machine • Responsible for controlling individual processing blocks • Request/Grant Scheme for future expandability • Initiates a Request for Packet to Ingress FIFO and then assigns to responsible agents based on packet contents • Replication of MSM to provide more throughput

  13. L3 Processing Engine • Parsing of the L3 Information: • Src/Dest Addr, Protocol Type, Checksum, Length, TTL • Longest Prefix Match Engine • Mask Bits to represent the prefix. Lookup Key is Dest Addr • Associated Info Table (AIT) Indexed using the entry hit • AIT provides Destination Port Map, Destination L2 Addr, Statistics Bucket Index • Request/Done scheme to allow for expandability (e.g. future m-way Trie implementation project) • ICMP Support Engine Request (if Dest Addr is Routers IP Address + Protocol Type is ICMP) • Total 85 cycles for Packet Processing with 80% of the cycles spent on Table Lookup If using 4-way trie, total processing time can be reduced to less than 30 cycles.

  14. L2 Processing Engine • If there is any processing problems with ARP, ICMP, and/or L3, then L2 switching is done • Exact Match Engine • Re-use of the LPM match engine but with Mask Bits set to all 1’s. • Associated Info Table (AIT) Indexed using the entry hit • AIT provides Destination Port Map, and Statistics Bucket Index • Request/Done scheme to allow for expandability (e.g. future Hash implementation project) • Learning Engine removed because of Switch/Router Hardware Verification problems (HP Switch bug) • Total 76 cycles for Packet Processing with over 80% of the cycles spent on Table Lookup If using Hashing Function, total processing time can be reduced to less than 20 cycles.

  15. Packet Buffer Interface • Interfaces with Master Arbiter and Forward Engine • Output Queued Switch • Statically Assigned • Single Queue per port • Off-chip ZBT SRAM on NetFPGA board

  16. Control Block • Typical Register Rd/Wr Functionality • Status Register • Control Register (forwarding disable, reset) • Router’s IP Addresses (port 1-4) • Queue Size Registers • Statistics Registers • Layer-2 Table Programming Registers • Layer-3 Table Programming Registers

  17. Verification • Three Levels of Verification Performed • Simulations: • Module Level – to verify the module design intent and bus functional model • System Level – using the NetFPGA verification environment for packet level simulations • Hardware Verification • Ported System Level tests to create tcpdump files for NetFPGA traffic server • Very good success on Hardware with all System Level tests passing. • Only one modification required (reset generation) after Hardware Porting • Demo - Greg can provide lab access to anyone interested

  18. Synthesis Overview • Design was ported to Altera EP20K400 Device • Logic Elements Utilized – 5833 (35% of Total LEs) • RAM ESBs Used – 46848 (21% of Total ESBs) • Max Design Clock Frequency ~ 31MHz • No Timing Violations

  19. Conclusion • Easy to achieve “required” performance in an OQ Shared Memory Switch in NetFPGA • Modularity of the design allows more interesting and challenging future projects • Design/Verification Environment was essential to meet schedule • NetFPGA is an excellent design exploration platform

More Related