1 / 24

FreeBSD Network Stack Performance

FreeBSD Network Stack Performance. Srinivas Krishnan University of North Carolina at Chapel Hill. Outline. Introduction Unix network stack improvements Bottlenecks Memory Copies Interrupt Processing Zero Copy Implementation Receive Live Lock Solution. Introduction. Socket Queue. User

favian
Download Presentation

FreeBSD Network Stack Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FreeBSD Network Stack Performance Srinivas Krishnan University of North Carolina at Chapel Hill

  2. Outline • Introduction • Unix network stack improvements • Bottlenecks • Memory Copies • Interrupt Processing • Zero Copy Implementation • Receive Live Lock Solution

  3. Introduction Socket Queue User Processing Memory Copy Transport + Network Soft Interrupt Kernel Processing IP Queue Memory Copy NIC Packet

  4. Network Stack Reinvented • Van Jacobson Net Channels • Create a High Speed Channel from NIC to User space • Push all processing to the user space • Applying E2E “truly” • Preserve cache coherency for multi-processor systems BETTER INTERRUPT PROCESSING

  5. Network Stack Reinvented • Ulrich Drepper’s Asynchronous Network I/O • Asynchronous sockets • True Zero Copy • No Locking • Event Channels BETTER MEMORY PROCESSING

  6. Reduce Memory Copies • Sending Side • Copy from User Buffer to Kernel Buffer • Copy from Kernel Buffer to Device Buffer • Receive Side • Copy from Device Buffer to Kernel Buffer • Copy from Kernel Buffer to Socket Buffer

  7. Zero Copy Send Page Sized chunks write RAM Userspace Pages External mbuf DMA into Driver Buffer NIC

  8. Zero Copy Read Packet NIC Kernel Space DMA Kernel Buffer User Space User Buffer read(fd, buf, s)

  9. Zero Copy • Allocate an External Mbuf Pool • NIC MTU has to be >= 4K • Intel Pro1000 NIC with Jumbo Frames • 3Com NIC turn on DMA • Buffer and stitch the data together • Added Overhead

  10. Page Flipping Check Mbuf len Atleast 1 Page ! 1 Page Page Size Use vm_pgmoveco (……) Use copyout read(….) Kernel Page <-> User Page

  11. Preliminary Results • 1500 bytes MTU (Iostat trace) for 10 mins

  12. Processing Interrupts • Main Processing • Hard Interrupt from NIC to driver • Soft Interrupt from IP Queue to processing • Reduce user level and interrupt thread processing • Problem: Receive Live Locks

  13. Receive Live Lock • Send large stream of UDP packets > receiver buffer capacity • CPU spent processing network packets • Goodput = 0

  14. Implementation Design Socket Queue Transport + Network IP Queue Driver Queue Scheduler NIC Packet

  15. Components • All UDP packets are queued in driver queue • Scheduler is triggered with the arrival of first UDP packet • Checks the queue every n ms (currently 1-2ms) • Schedules packet departure rate based on timestamps

  16. Driver Queue Algorithm • Set maximum rate and average rates • Driver Queue maintains • Average Queue Length (Weighted over time) • Current Rate of transfer • Time stamp of packets

  17. Algorithm (cont) • If current_rate > average rate • Drop N packets such that current_rate == average_rate • If current_rate > max rate (Spike) • Drop all packets • Reduce Time Wait in Queue • If Current Queue Size < threshold • Schedule packet exit such that rate == average_rate • Appends an exit time to each packet

  18. Pros and Cons • Easy implementation requires no scheduling changes • Reduces CPU utilization in worst case by ~25% • Low Overhead • Introduces added jitter

  19. Experimental Setup • Iostat Trace • Netstat trace • Custom queue stats Send UDP Data Intel Pro1000 Nics Receive UDP Data Intel Pro1000 Nics

  20. Queue Stats • At the Receiver • Collect Average Queue Size • CPU Utilization • Packet Drops • Total Number of packets processed

  21. Receive Live Lock

  22. Receive Live Lock (soln)

  23. Future Work • Feedback from Socket Queue and IP queue such that Weighted Average computed over all 3 queues • Drop at driver before DMA • Driver buffer not large enough to keep weighted queue size • Feedback from Driver Queue Scheduler to driver to drop

  24. Questions ?

More Related