Parallelizing Network Packet Processing in Software Router

Parallelizing Network Packet Processing in Software Router Parallel Systems and Architecture CS6399 Term Project Team Members: AmriteshwarSingh, Bibin Thomas, Divya Sasidharan, HarsVardhan, and PriteshBaviskar

Overview • Introduction • Project Description • Software Router v/s Hardware Router • Parallelizing Across Servers • Parallelizing Within Server • Project Methodology • Experiment Results • Conclusion • References

Introduction • What is a Software Router. • How to go about developing one. • Across Server Parallelism. • With-in Server Parallelism. • Route-Bricks Architecture.

Project Description • Objective • Understanding and evaluating the clustered software router architecture. • Exploiting parallelism • Challenges • High Performance • Re-programmability

Software Router v/s Hardware Router • Single Server Software Router • Cluster Router Architecture • Key features • Each server processes packets at a rate proportional to R.(cR, c=constant independent of N) • Decentralization -Load Balanced interconnects • Extensible – Increase in number of ports proportional to N , with the addition of new servers. External Lines R bps 1 Inter Server Connection 2 R bps R bps N Internal Switching Fabric Traditional Router Architecture Software Router Architecture

Parallelizing Across Servers • Valiant Load Balancing Routing.(VLB) • Full Mesh Interconnection. • Two Phase Routing • Phase I - Source sends input packet to a random intermediate node. • Phase II – Intermediate node relays to Destination node. • Node Roles - Source, Intermediate, Destination. • Drawbacks • Additional cost of forwarding packets(3R as opposed to 2R still << NR) • Direct VLB • Observation of uniform traffic across nodes. • Avoiding Phase I overhead in VLB for randomizing the inflow. • Avoids the VLB’s per-server processing additional cost.(Max=2R)

Valiant Load Balanced Mesh External Lines • Topology: k-ary n-fly • k=per server fanout ; n=logk N • Configurations (Feasibility Study) • Current Servers – 1 router port and 5NIC’s per server (Max N=32 ports) • Additional NIC’s – 1 router port and 20 NIC’s (Max N=128 ports) • Faster Servers – 2 router ports and 20 NIC’s (Max N=2048 ports) 1 1’ 1’’ 2 2’ 2’’ 3 3’ 3’’ Internal Lines R/8 bps 8 8’ 8’’ Logical View:8-port VLB Mesh

Parallelism within server • Exploring the design considerations • Server architecture • Workload balancing across cores • Single queue or multiple queues • Batch processing

Server Architecture (Traditional)

Server Architecture (NUMA)

Load balancing across cores • Accessing the queue to transmit/receive the packet • Rule : Each network queue must be accessed by single core • Processing of the packets • Rule: Parallel approach must be adopted

Queuing technique Single queue or Multiple Queue

Parallelism within server • Exploring the design considerations • Server architecture • Division of work among cores • Single queue or multiple queue • Batch processing

Experimental Results • para1 – para8.utdallas.edu machine used • Mesh interconnect up to 16 nodes. • Valiant Load Balancing • MPI with multi-threading • Threads on each node: • Packet Generator (data-in) • Packet receiver within System • Table lookup and Sender (data-out) • MPI Source Code • Results

Experiment Results

Conclusion • Performance • Competitive performance when compared to a hardware router. • Cost • 20-port 10Gbps ~ 13K • Programmability • Use existing MPI Lib for implementation • Relaxed Performance Guarantee

References • RouteBricks: Exploiting Parallelism To Scale Software Routers, MihaiDobrescu and Norbert Egi, KaterinaArgyraki, Byung-Gon Chun, Kevin Fall, GianlucaIannaccone, Allan Knies, MaziarManesh, Sylvia Ratnasamy, “ 22nd ACM Symposium on Operating Systems Principles (SOSP), October 2009”. • Can Software Routers Scale? , KaterinaArgyraki, SalmanBaset, Byung-Gon Chun, Kevin Fall, GianlucaIannaccone, Allan Knies, Eddie Kohler, MaziarManesh, SergiuNedveschi, Sylvia Ratnasamy, “ ACM Sigcomm Workshop - PRESTO, August 2008”. • Next Generation Intel Microarchitecture (Nehalem), White Paper, Intel Corp., 2008.

???

Parallelizing Network Packet Processing in Software Router

Parallelizing Network Packet Processing in Software Router

Presentation Transcript

Network Router Setup Network Printer Setups

NETWORK ON CHIP ROUTER

The Case for Hardware Transactional Memory in Software Packet Processing

Network Protocol Packet Analysis

Parallelizing Programs

All-optical Packet Router Employing PPM Header Processing

Router Design and Packet Scheduling

In Network Processing

In-Network Query Processing

Network Layer Packet Forwarding

CVS II: Parallelizing Software Development

Modifying Network Packet Buffering in Network Layer

Packet Switch Network

Parallelizing Computations

Detachment 845 In-processing Packet

FLORIDA LAYERED PACKET NETWORK

In-Network processing

Network Packet Generator

Chapter 6 Packet Processing Functions

NETWORK ON CHIP ROUTER

Designing Packet Buffers for Router Linecards

In-Network Query Processing