200 likes | 317 Views
This overview explores the evolution and challenges of network processing systems. It analyzes the inefficiencies inherent in first, second, and third-generation architectures, highlighting scalability, flexibility, and the need for innovation in fourth-generation network processor technology. The discussion includes the limitations of traditional processing solutions such as ASICs, the importance of instruction set optimization, and the role of parallel architectures. Additionally, it emphasizes the significance of memory architecture and application mapping for improved performance.
E N D
ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer
Goal • Understanding the inefficiency of 1st, 2nd and 3rd generation network processing systems • Scalability plus flexibility • Recognizing the necessity of new solution: 4th generation (network processor technology) • Learning • courage to appreciate the challenges • skill to characterize the “real” problem • art to propose an engineering solution • Be aware of current network processor is a conceptual and general term ECE 526
Recall 1ST • 1st generation network processing system • Feasibility study • Design a software router • data rate 10Gbps • Assuming small packets (64B) • Assuming each packet need 10,000 instruction to process • Can Intel 80986@2007 do the job? • CPU:24Ghz • MIPs:125,000 (Million Instruction Per Second) • 1 billion transistors …. • Conclusion: not feasible • What is the real problem here? ECE 526
Real Problem is • Technology push: uneven • Link bandwidth scaling much faster than CPU and memory technology • Transistor scaling and VLSI technology help but not enough • Application pull: harder • More complex applications are required • Processing complexity is defined as the number of instructions and number of memory access to process one packet ECE 526
Structured ASIC • Reconfigurable Co-processors • Network Processor • FPGA What is the ideal platform?
2nd and 3rd Generations • 2nd generation: offloading and decentralized • 3rd generation: further offloading and using specialized devices (ASIC + embedded processors) • Problems: losing the flexibility and very cost, why? ECE 526
Why not ASIC? • High cost to develop • Network processing moderate quantity market • Long time to market • Network processing quickly changing services • Difficult to simulate • Complex protocol • Expensive and time-consuming to change • Little reuse across products • Limited reuse across versions • No consensus on framework or supporting chips • Requires expertise ECE 526
Network Processors • Question: where does NP gain higher performance from, compared with conventional processor? ECE 526
Instruction Set: minimality • Not general as RISC and CISC processor • E.g. no floating point instructions • Optimized for packet processing functions only • Not specific to a protocol or part a protocol • Seek a minimal set of instruction set of instructions sufficient to handle arbitrary protocol, • plus specific instructions for protocol processing • Example : atomic operation • Hard problem and will cover later ECE 526
Architecture: multiprocessor • Parallelism • The nature of workload network processing: high parallel • Flow-level • Queue-level • Packet-level • Protocol-level • Pipelining • Pipeline will help system performance at cost of longer delay • Is this acceptable? • System-on-chip • Processing: RISC core • Memory: register, cache, instruction store, scratch pad, SRAM and SDRAM • I/O: network /switch fabric interfaces • Question: how hard to build and use this NPs? ECE 526
Typical Processing ECE 526
From (0) • From (1) • Root • a • b • c • d • e • Prefix (hex : binary) • : 0* • 002 : * • 002F : * • FFE : 000* • FFF : * • FFF • FFE • 000 • 001 • 002 • 003 • Memory access 1 • e • b • a • a • a • 0 • 1 • F • 0 • 1 • F • Memory access 2 • b • b • c • d • d • Lookup • IPRoute • To (0) • Memory access 5 • To (1) • 0 • 1 • F • Memory access 6 Case Study: IPv4 Packet Forwarding • 2-port router (2 Gbps) • Xilinx Virtex-II Pro FPGA (2VP30) • IP Lookup: • longest prefix match • (trie lookup algorithm)
RS232 • Timer • BRAM • BRAM • OPB • LEDs • Verify • Lookup-1 • Lookup-2 • Transmit • Verify • Lookup-1 • Lookup-2 • Transmit • FSL • Packet Transmission • Packet Reception • Verify • Lookup-1 • Lookup-2 • Transmit • Verify • Lookup-1 • Lookup-2 • Transmit • BRAM • BRAM Multiprocessor for Header Processing • FIFO queues
Typical using NPs ECE 526
System Implementation Space ECE 526
Memory Architecture • Memory access bottleneck • Memory is area consuming • Limited memory-on-chip • Limited bandwidth to off-chip memory: pin and package cost • Off-chip memory access is slow: 100 cycles • Possible solutions • Profiling application memory access pattern • Propose heterogeneous memory architecture • Memory aware mapping • Transactional memory (project topic) ECE 526
Application Mapping Mapping Current approach: fixed topology, assembly coding & hand-tuning ECE 526
PE • FPGA • MEM • MEM • From (1) • From (0) • FPGA • PE • MEM • FPGA • PE • PE • FPGA • Lookup • IPRoute • To (0) • MEM • MEM • To (1) Basic Steps for Mapping • Application description • High-level optimizations • Task graph • (platform specific) • Profile • Architecture configuration • HW / SW partitioning • Task allocation • Data layout • Communication assignment • Compilation / Synthesis
Summary • Network Processor • Special purpose, programmable hardware device • Optimized for network processing • Building blocks of network processing systems • Fundamental ideas • Flexibility through programmability • Scalability with parallelism and pipelining • Here, NP is a concept • We will learn example of network processor soon ECE 526
For Next Class & Announcement • Read Comer: chapter 13 and 14 • Lab 1 total grade reduce to 82 • HW 1 due Wed. • Project topic will be announced after Wed. ECE 526