1 / 18

Paper Review

Paper Review. Building a Robust Software-based Router Using Network Processors. ABSTRACT. Need More Service  Software-based Routers Router: IXP1200 Network Processor development board PC 3.47 Mpps (minimum size packets) or 1.77 G of aggregate Hierarchical Architecture:

travis
Download Presentation

Paper Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paper Review Building a Robust Software-based Router Using Network Processors

  2. ABSTRACT Need More Service  Software-based Routers Router: • IXP1200 Network Processor development board • PC 3.47 Mpps (minimum size packets) or 1.77 G of aggregate Hierarchical Architecture: • Guarantees line speed for forwarding of simple packets • Extra capacity for exceptional packets in P3(310 Kpps and 1510 cycles for each)

  3. INTRODUCTION Most Network Processors use parallelism. IXP1200: 6 Micro Engines each supporting up to 4 hardware contexts. Router with a data plane (MEs) and a control plane (P3). Processor Hierarchy: OSPF, Updating Routing tables, …[More cycles] Missed packets from cache Minimum packet processing, forwarding,…[Fewer cycles]

  4. ARCHITECTURE-Software • Classifier • Forwarder • Scheduler • Two default forwarder: • Minimal IP forwarding fast path. • Full IP protocol (IP options) Input queue Input queue Two main attributes: Explicit support for adding new forwarders in run time Does not specify where in Processor hierarchy

  5. ARCHITECTURE-Hardware • IXP Evaluation System (200MHz): • 32MB DRAM (64-bit 100MHz) • 2MB SRAM (32-bit 100MHz) • 4KB On-chip scratch • 64-bit 66MHz IX bus • Ethernet ports(8*100M + 2*1G) • 32-bit 100MHz PCI Bus • 4KB ISTORE for each ME • 4KB I-cache for StrongARM • A pair of FIFOs: (16 slot*64 byte) rate of DRAM = 6.4Bbps Send/receive BW = 2*(8*100M+2*1G) = 5.6 Gbps Capacity of IX Bus = 4 Gbps

  6. Forwarding Pipeline The common unit = 64-byte MAC-packet(MP) MAC breaks and tag as first, intermediate, last or only MP in packet Allocating slots to MACs and drains input FIFO and fill output FIFO Can MEs from input FIFO to output FIFO in a single step? 2 stage pipeline:

  7. For IP: Validating header Updating TTL Re-computing checksum Set source and dest MACs Destination Queue Input Processing INPUT_LOOP: 1 acquire_input_mutex() 2 if (!port_rdy(p)) goto INPUT_LOOP 3 load IN_FIFO[c] 4 release_input_mutex() 5 mp_addr = calculate mp_addr() 6 copy reg_mp_data IN_FIFO[c] 7 state = protocol_processing(reg_mp_data) 8 copy reg_mp_data  DRAM[ mp_addr] 9 if (at_start_of_packet(state)) 10 enqueue(state, state.queue) 11 goto INPUT_LOOP Minimum Forwarder: one-cycle hardware hash Strict FIFO slots and context binding

  8. Scheduling & Buffering A Queue that is serviced by StrongARM Statically allocates a set of contexts to run input loop 16 input contexts Token passing (hardware signaling mechanism) to serialize DMA access. Buffer scheduling: 16MB of DRAM (8192 buffers of 2KB) consumed in a circular fashion A shared state variable

  9. Output Processing OUTPUT LOOP: 1 acquire_output_mutex() 2 release_output_mutex() 3 if (finished_last_ packet) 4 qid = select_queue() 5 state = dequeue(qid) 6 mp_addr = first_mp(state) 7 else 8 mp_addr =next_mp(state) 9 fifo_addr = calculate_fifo_addr() 10 copy DRAM[mp_addr]OUT_FIFO[fifo_addr] 11 enable IN_FIFO[fifo_addr] 12 finished_last_packet =at_end_of_packet(state) 13 goto OUTPUT LOOP Select none empty queue form that port queues (Scheduling)

  10. Queuing Queues: Circular arrays of 32-bit entries in SRAM. Queues are assigned statically to output contexts: Output context saves queues in 16 registers not in scratch memory. Multiple queues. Which one next? By prioritizing queues. Contention: • Use mutexes. • Have queues for each inputs in outputs  Single priority level

  11. Queuing [cont] I.2 + O.1 I.2 + O.3 : Maximum flexibility I.1 + O.3 : Slower rate

  12. Evaluation For one MP: 280 cycles for register operations 180(DRAM) + 90(SRAM) + 160(Scratch) = 430 cycles for memory Sum = 710 cycles = 3550 ns (for 200 MHz) 3.47 Mppseach packet is processed in 288 ns Result: The system can forward 12 packets in parallel

  13. Switching Paths Path C: Forward packets at 534 Kpps(500cpp) StrongARM is involved too. |No additional tasks for MEs. Path B: Forward packets at 526 Kpps Path A: Forward packets at maximum rate of 3.47Mpps PRIORITY

  14. StrongARM Complicated to decide forwarders: It supports Pentium It shares resources with MEs and can act like them • OS on StrongARM: • Acts as a bridge that forward packets to P4 • Supports a small collection of local forwarders Simple priority scheme: Gives packets being passed to P3 over packets that are to be processed locally.

  15. Virtual Router Processor • MEs statically have 2 tasks: • A router infrastructure (RI) that is able to forward minimum-sized packets • A virtual router processor (VRP) that run additional code on behalf of each packet • protocol_processing runs on abstract machine.

  16. Interfacing & Implementation Installs fwrd that matches the key and specified flow size and where indicates the processor • StrongARM interacts with MEs: • fid = install(key, fwdr, size, where) • remove(fid) • data = getdata(fid) • setdata(fid, data) Key: (src addr, src port, dst addr, dst port) Where: ME: Load from StrongARM to ME’s ISTORE SA: Loads into DRAM PE: Loads into Pentium jump table

  17. Interfacing Some date forwarders:

  18. Conclusions • How to program the processor hierarchy with a fixed forwarding infrastructure that fully exploits the parallelism available on the IXP1200 MicroEngines. • Demonstrates how new functionality can be injected into all three levels of the processor hierarchy. • Statically partition the processing capacity of the MicroEngines into a fixed routing infrastructure and a programmable VRP. • Can be used in many designs.

More Related