Freebsd network stack performance
Download
1 / 24

FreeBSD Network Stack Performance - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

FreeBSD Network Stack Performance. Srinivas Krishnan University of North Carolina at Chapel Hill. Outline. Introduction Unix network stack improvements Bottlenecks Memory Copies Interrupt Processing Zero Copy Implementation Receive Live Lock Solution. Introduction. Socket Queue. User

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' FreeBSD Network Stack Performance' - favian


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Freebsd network stack performance

FreeBSD Network Stack Performance

Srinivas Krishnan

University of North Carolina at

Chapel Hill


Outline
Outline

  • Introduction

    • Unix network stack improvements

    • Bottlenecks

      • Memory Copies

      • Interrupt Processing

    • Zero Copy Implementation

    • Receive Live Lock Solution


Introduction
Introduction

Socket Queue

User

Processing

Memory Copy

Transport + Network

Soft

Interrupt

Kernel

Processing

IP Queue

Memory Copy

NIC

Packet


Network stack reinvented
Network Stack Reinvented

  • Van Jacobson Net Channels

    • Create a High Speed Channel from NIC to User space

    • Push all processing to the user space

    • Applying E2E “truly”

    • Preserve cache coherency for multi-processor systems

BETTER INTERRUPT PROCESSING


Network stack reinvented1
Network Stack Reinvented

  • Ulrich Drepper’s Asynchronous Network I/O

    • Asynchronous sockets

    • True Zero Copy

    • No Locking

    • Event Channels

BETTER MEMORY PROCESSING


Reduce memory copies
Reduce Memory Copies

  • Sending Side

    • Copy from User Buffer to Kernel Buffer

    • Copy from Kernel Buffer to Device Buffer

  • Receive Side

    • Copy from Device Buffer to Kernel Buffer

    • Copy from Kernel Buffer to Socket Buffer


Zero copy send
Zero Copy Send

Page Sized chunks

write

RAM

Userspace Pages

External

mbuf

DMA into Driver Buffer

NIC


Zero copy read
Zero Copy Read

Packet

NIC

Kernel Space

DMA

Kernel Buffer

User Space

User Buffer

read(fd, buf, s)


Zero copy
Zero Copy

  • Allocate an External Mbuf Pool

  • NIC MTU has to be >= 4K

  • Intel Pro1000 NIC with Jumbo Frames

  • 3Com NIC turn on DMA

    • Buffer and stitch the data together

      • Added Overhead


Page flipping
Page Flipping

Check Mbuf len

Atleast 1 Page

! 1 Page

Page Size

Use

vm_pgmoveco (……)

Use

copyout

read(….)

Kernel Page <->

User Page


Preliminary results
Preliminary Results

  • 1500 bytes MTU (Iostat trace) for 10 mins


Processing interrupts
Processing Interrupts

  • Main Processing

    • Hard Interrupt from NIC to driver

    • Soft Interrupt from IP Queue to processing

  • Reduce user level and interrupt thread processing

  • Problem: Receive Live Locks


Receive live lock
Receive Live Lock

  • Send large stream of UDP packets > receiver buffer capacity

  • CPU spent processing network packets

  • Goodput = 0


Implementation design
Implementation Design

Socket Queue

Transport + Network

IP Queue

Driver Queue

Scheduler

NIC

Packet


Components
Components

  • All UDP packets are queued in driver queue

  • Scheduler is triggered with the arrival of first UDP packet

  • Checks the queue every n ms (currently 1-2ms)

  • Schedules packet departure rate based on timestamps


Driver queue algorithm
Driver Queue Algorithm

  • Set maximum rate and average rates

  • Driver Queue maintains

    • Average Queue Length (Weighted over time)

    • Current Rate of transfer

    • Time stamp of packets


Algorithm cont
Algorithm (cont)

  • If current_rate > average rate

    • Drop N packets such that current_rate == average_rate

  • If current_rate > max rate (Spike)

    • Drop all packets

  • Reduce Time Wait in Queue

    • If Current Queue Size < threshold

      • Schedule packet exit such that rate == average_rate

      • Appends an exit time to each packet


Pros and cons
Pros and Cons

  • Easy implementation requires no scheduling changes

  • Reduces CPU utilization in worst case by ~25%

  • Low Overhead

  • Introduces added jitter


Experimental setup
Experimental Setup

  • Iostat Trace

  • Netstat trace

  • Custom queue stats

Send UDP Data

Intel Pro1000 Nics

Receive UDP Data

Intel Pro1000 Nics


Queue stats
Queue Stats

  • At the Receiver

    • Collect Average Queue Size

    • CPU Utilization

    • Packet Drops

    • Total Number of packets processed




Future work
Future Work

  • Feedback from Socket Queue and IP queue such that Weighted Average computed over all 3 queues

  • Drop at driver before DMA

    • Driver buffer not large enough to keep weighted queue size

    • Feedback from Driver Queue Scheduler to driver to drop



ad