1 / 18

Parallelizing Network Packet Processing in Software Router

Parallelizing Network Packet Processing in Software Router. Parallel Systems and Architecture CS6399 Term Project. Team Members: Amriteshwar Singh, Bibin Thomas, Divya Sasidharan, Hars Vardhan , and Pritesh Baviskar. Overview. Introduction Project Description

haru
Download Presentation

Parallelizing Network Packet Processing in Software Router

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelizing Network Packet Processing in Software Router Parallel Systems and Architecture CS6399 Term Project Team Members: AmriteshwarSingh, Bibin Thomas, Divya Sasidharan, HarsVardhan, and PriteshBaviskar

  2. Overview • Introduction • Project Description • Software Router v/s Hardware Router • Parallelizing Across Servers • Parallelizing Within Server • Project Methodology • Experiment Results • Conclusion • References

  3. Introduction • What is a Software Router. • How to go about developing one. • Across Server Parallelism. • With-in Server Parallelism. • Route-Bricks Architecture.

  4. Project Description • Objective • Understanding and evaluating the clustered software router architecture. • Exploiting parallelism • Challenges • High Performance • Re-programmability

  5. Software Router v/s Hardware Router • Single Server Software Router • Cluster Router Architecture • Key features • Each server processes packets at a rate proportional to R.(cR, c=constant independent of N) • Decentralization -Load Balanced interconnects • Extensible – Increase in number of ports proportional to N , with the addition of new servers. External Lines R bps 1 Inter Server Connection 2 R bps R bps N Internal Switching Fabric Traditional Router Architecture Software Router Architecture

  6. Parallelizing Across Servers • Valiant Load Balancing Routing.(VLB) • Full Mesh Interconnection. • Two Phase Routing • Phase I - Source sends input packet to a random intermediate node. • Phase II – Intermediate node relays to Destination node. • Node Roles - Source, Intermediate, Destination. • Drawbacks • Additional cost of forwarding packets(3R as opposed to 2R still << NR) • Direct VLB • Observation of uniform traffic across nodes. • Avoiding Phase I overhead in VLB for randomizing the inflow. • Avoids the VLB’s per-server processing additional cost.(Max=2R)

  7. Valiant Load Balanced Mesh External Lines • Topology: k-ary n-fly • k=per server fanout ; n=logk N • Configurations (Feasibility Study) • Current Servers – 1 router port and 5NIC’s per server (Max N=32 ports) • Additional NIC’s – 1 router port and 20 NIC’s (Max N=128 ports) • Faster Servers – 2 router ports and 20 NIC’s (Max N=2048 ports) 1 1’ 1’’ 2 2’ 2’’ 3 3’ 3’’ Internal Lines R/8 bps 8 8’ 8’’ Logical View:8-port VLB Mesh

  8. Parallelism within server • Exploring the design considerations • Server architecture • Workload balancing across cores • Single queue or multiple queues • Batch processing

  9. Server Architecture (Traditional)

  10. Server Architecture (NUMA)

  11. Load balancing across cores • Accessing the queue to transmit/receive the packet • Rule : Each network queue must be accessed by single core • Processing of the packets • Rule: Parallel approach must be adopted

  12. Queuing technique Single queue or Multiple Queue

  13. Parallelism within server • Exploring the design considerations • Server architecture • Division of work among cores • Single queue or multiple queue • Batch processing

  14. Experimental Results • para1 – para8.utdallas.edu machine used • Mesh interconnect up to 16 nodes. • Valiant Load Balancing • MPI with multi-threading • Threads on each node: • Packet Generator (data-in) • Packet receiver within System • Table lookup and Sender (data-out) • MPI Source Code • Results

  15. Experiment Results

  16. Conclusion • Performance • Competitive performance when compared to a hardware router. • Cost • 20-port 10Gbps ~ 13K • Programmability • Use existing MPI Lib for implementation • Relaxed Performance Guarantee

  17. References • RouteBricks: Exploiting Parallelism To Scale Software Routers, MihaiDobrescu and Norbert Egi, KaterinaArgyraki, Byung-Gon Chun, Kevin Fall, GianlucaIannaccone, Allan Knies, MaziarManesh, Sylvia Ratnasamy, “ 22nd ACM Symposium on Operating Systems Principles (SOSP), October 2009”. • Can Software Routers Scale? , KaterinaArgyraki, SalmanBaset, Byung-Gon Chun, Kevin Fall, GianlucaIannaccone, Allan Knies, Eddie Kohler, MaziarManesh, SergiuNedveschi, Sylvia Ratnasamy, “ ACM Sigcomm Workshop - PRESTO, August 2008”. • Next Generation Intel Microarchitecture (Nehalem), White Paper, Intel Corp., 2008.

  18. ???

More Related