1 / 36

CSE 58x: Networking Practicum

Instructor: Wu-chang Feng TA: Francis Chang. CSE 58x: Networking Practicum. About the course. Prerequisite: CSE 524 or the equivalent Implementation-focused course Intel's IXA network processor platform Contents Brief lecture material on network processors and the IXP

bian
Download Presentation

CSE 58x: Networking Practicum

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instructor: Wu-chang Feng TA: Francis Chang CSE 58x: Networking Practicum

  2. About the course • Prerequisite: CSE 524 or the equivalent • Implementation-focused course • Intel's IXA network processor platform • Contents • Brief lecture material on network processors and the IXP • 5 weeks of designed laboratories • 3 weeks of final projects

  3. Modern router architectures • Split into a fast path and a slow path • Control plane • High-complexity functions • Route table management • Network control and configuration • Exception handling • Data plane • Low complexity functions • Fast-path forwarding

  4. Router functions • RFC 1812 plus... • Error detection and correction • Traffic measurement and policing • Frame and protocol demultiplexing • Address lookup and packet forwarding • Segmentation, fragmentation, reassembly • Packet classification • Traffic shaping • Timing and scheduling • Queuing • Security

  5. Design choices for network products • General purpose processors • Embedded RISC processors • Network processors • Field-programmable gate arrays (FPGAs) • Application-specific integrated circuits (ASICs)

  6. General purpose processors (GPP) • Programmable • Mature development environment • Typically used to implement control plane • Too slow to run data plane effectively • Sequential execution • CPU/Network 50x increase over last decade • Memory latencies 2x decrease over last decade • Gigabit ethernet: 333 nanosecond per packet budget • Cache miss: ~150-200 nanoseconds

  7. Embedded RISC processors (ERP) • Same as GPP, but • Slower • Cheaper • Smaller (require less board space) • Designed specifically for network applications • Typically used for control plane functions

  8. Application-specific integrated circuits (ASIC) • Custom hardware • Long time to market • Expensive • Difficult to develop and simulate • Not programmable • Not reusable • But, the fastest of the bunch • Suitable for data plane

  9. Field Programmable Gate Arrays (FPGA) • Flexible re-programmable hardware • Less dense and slower than ASICs • Cheaper than ASICs • Good for providing fast custom functionality • Suitable for data plane

  10. Network processors • The speed of ASICs/FPGAs • The programmability and cost of GPPs/ERPs • Flexible • Re-usable components • Lower cost • Suitable for data plane

  11. Network processors • Common features • Small, fast, on-chip instruction stores (no caching) • Custom network-specific instruction set programmed at assembler level • What instructions are needed for NPs? Open question. • Minimality, Generality • Multiple processing elements • Multiple thread contexts per element • Multiple memory interfaces to mask latency • Fast on-chip memory (headers) and slow off-chip memory (payloads) • No OS, hardware-based scheduling and thread switching

  12. Why network processors? • The propaganda • Take the current vertical network device market • Commoditize horizontal slices of it • PC market • Initially, an IBM custom vertical • Now, a commodity market with Intel providing the chip-set • Network device market • Draw your own conclusions

  13. Network processing approaches ASIC FPGA Network processor Speed GPP Embedded RISC Processor Programming/Development Ease

  14. Network processor architectures • Packet path • Store and forward • Packet payload completely stored in and forwarded from off-chip memory • Allows for large packet buffers • Re-ordering problems with multiple processing elements • Intel IXP, Motorola C5 • Cut-through • Packet held in an on-chip FIFO and forwarded through directly • Small packet buffers • Built-in packet ordering • AMCC

  15. Network processor architectures • Processing architecture • Parallel • Each element independently performs entire processing function • Packet re-ordering problems • Larger instruction store needed per element • Pipelined • Each element performs one part of larger processing function • Communicates result to next processing element in pipeline • Smaller code space • Packet ordering retained • Deterministic behavior (no memory thrashing) • Hybrid

  16. Network processor architectures • Processing hierarchy • ASICs • Embedded RISC processors • Specialized co-processors • See figure 13.7 in book

  17. Network processor architectures • Memory hierarchy • Small on-chip memory • Control/Instruction store • Registers • Cache • RAM • Large off-chip memory • Cache • Static RAM • Dynamic RAM

  18. Network processor architectures • Internal interconnect • Bus • Cross-bar • FIFO • Transfer registers

  19. Network processor architectures • Concurrency • Hardware support for multiple thread contexts • Operating system support for multiple thread contexts • Pre-emptiveness • Migration support

  20. Increasing network processor performance • Processing hierarchy • Increase clock speed • Increase elements • Memory hierarchy • Increase size • Decrease latency • Pipelining • Add hierachies • Add memory bandwidth (parallel stores) • Add functional memory (CAMs)

  21. Focus of this class... • Network processors • Intel IXA

  22. IXP 1200 features • One embedded RISC processor (StrongARM) • Runs control plane (Linux) • 6 programmable packet processors (m-engines) • Runs data plane (m-engine assembler or m-engine C) • Central hash unit • Multiple, bus interconnects • IXBus (4.4Gbps) to overcome PCI's 2.2Gbps limit • Small on-board memory • Serial interface for control • External interfaces for memory

  23. IXP12xx m-engine

  24. IXP2xxx m-engine

  25. m-engine functions • Packet ingress from physical layer interface • Checksum verification • Header processing and classification • Packet buffering in memory • Table lookup and forwarding • Header modification • Checksum computation • Packet egress to physical layer interface

  26. m-engine characteristics • Programmable microcontroller • Custom RISC instruction set • Private 2048 instruction store per m-engine (loaded by StrongARM) • 5-stage execution pipeline • Hardware support for 4 threads and context switching • Each m-engine has 4 hardware contexts (mask memory latency)

  27. m-engine characteristics • 128 general purpose registers • Can be partitioned or shared • Absolute or context-relative • 128 transfer registers • Staging registers for memory transfers • 4 blocks of 32 registers • SDRAM or SRAM • Read or Write • Local Control and Status Registers (CSRs) • USTORE instructions, CTX, etc. (p. 315)

  28. m-engine characteristics • FBI unit • Scratchpad memory • Hash unit • FBI CSRs • IXBus control • IXBus FIFOs • Transmit and Receive FIFOs to external line cards

  29. 32 m-engine opcodes • ALU instructions • ALU, ALU_SHF, DBL_SHIFT • Branch/Jump instructions • BR, BR=0, BR!=0, BR_BSET, BR=BYTE, BR=CTX, BR_INP_STATE, BR_!SIGNAL, JUMP, RTN, etc. • Reference instructions • CSR, FAST_WR, LOCAL_CSR_RD, R_FIFO_RD, PCI_DMA, SCRATCH, SDRAM, SRAM, T_FIFO_WR, etc. • Local register instructions • FIND_BST, IMMED, LD_FIELD, LOAD_ADDR, LOAD_BSET_RESULT1, etc.

  30. 32 m-engine functions • Miscellaneous • CTX_ARB • NOP • HASH1_48, HASH1_64, etc.

  31. 8 9 8 8 9 7. m-engine or StrongARM processing 8. Packet header read from SDRAM or RFIFO into m-engine and classified (via SRAM tables) 9. Packet headers modified 10. mpackets sent to interface 11. Poll for space on MAC Update transmit-ready if room for mpacket 12. mpackets transferred to MAC 1. Packet received on physical interface (MAC) 2. Ready-bus sequencer polls MAC for mpacket Updates receive-ready upon a full mpacket 3. m-engine polls for receive-ready 4. m-engine instructs FBI to move mpacket from MAC to RFIFO 5. m-engine moves mpacket directly from RFIFO to SDRAM 6. Repeat 1-5 until full packet received

  32. Programming the IXP • Focus of this course on steps 7, 8, and 9 • 2 programming frameworks • Command-line, IXA Active Computing Engine (ACE) framework • Graphical microengine C development environment

  33. Programming the IXP • Command-line, IXA Active Computing Engine (ACE) framework • Re-usable function blocks chained together to build an application (Chapters 22-24) • New functions implemented as new blocks in chain • Core ACEs (StrongARM) • Written in C • Microblock ACEs (microengines) • Written in assembler

  34. Programming the IXP • Graphical microengine C development environment • Monolithic microengine C code (can not be used on IXP1200 hardware) • Demos forthcoming

More Related