Network driver performance l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Network Driver Performance PowerPoint PPT Presentation


  • 263 Views
  • Uploaded on
  • Presentation posted in: General

Network Driver Performance. Outline. Software features for high performance NICs Some of the top features include: Scatter-Gather DMA Automatic Tuning of resources Task Offloading support for IPv6 Hardware features for high performance NICs Some of the top features include:

Download Presentation

Network Driver Performance

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Network driver performance l.jpg

Network Driver Performance


Outline l.jpg

Outline

  • Software features for high performance NICs

    • Some of the top features include:

      • Scatter-Gather DMA

      • Automatic Tuning of resources

      • Task Offloading support for IPv6

  • Hardware features for high performance NICs

    • Some of the top features include:

      • Task Offloading support

      • Receive-Side Scaling (RSS) support

  • Performance Tools

    • NTttcp

    • Kernrate Profiler


Goals l.jpg

Goals

  • This information can be used to optimally tune your network driver to work with your hardware for best networking performance

  • This information can be used to fine-tune your hardware features to operate at its optimal performance

  • How to use NTttcp to isolate Network performance problems

  • How to use Kernrate to identify bottlenecks on hot paths

    Note: The mention of packets is relevant to NDIS 5.x drivers and translates to NetBuffers and NetBufferLists for NDIS 6.0 drivers on Windows codenamed “Longhorn”


Software optimizations l.jpg

Software Optimizations


Network software optimizations l.jpg

Network Software Optimizations

  • Scatter Gather DMA

    • SG DMA yields optimum performance with NDIS 6.0 model

    • It is highly recommended to pre-allocate the buffer hosting the SCATTER_GATHER_LIST as part of Transmit Control Block during the initialization phase and reuse it.

    • Use maximum buffer size for MaximumPhysicalMapping parameter in NdisMInitializeScatterGatherDma function to avoid buffer allocation and copy

  • Using Cached Memory to allocate NIC receive buffers

    • X86, IA64, and x64 hardware guarantees DMA coherency and there is no need to call IoFlushBuffer since it would become a nop

      NdisMAllocateSharedMemory( pMpRxbuf->AllocSize,

      TRUE, // CACHED

      &pMpRxbuf->AllocVa,

      &pMpRxbuf->AllocPa);


More network software optimizations l.jpg

More Network Software Optimizations

  • NDIS Safe APIs

    • Required for NDIS 6.0 model!

    • It has shown overall TCP/IP improvements of up to 7% in Kernel mode scenarios (e.g. IIS 6.0)

    • Eliminate the need to call into Kernel for probing and locking buffer

    • Set NDIS_ATTRIBUTES_USES_SAFE_BUFFER_APIS flag in NdisMSetAttributesEx for NDIS 5.x drivers. The flag does not need to be set for NDIS 6.0 drivers

    • Example: When using NdisQueryBufferSafe, the VirtualAddress parameter should be set to NULL to avoid mapping of buffers sent down by NDIS

  • 64-bit DMA Support

    • Avoid copies for addresses above the 4GB range by setting Dma64Addresses to TRUE in NdisMInitializeScatterGatherDma


Locking mechanisms optimizations l.jpg

Locking Mechanisms Optimizations

  • Expensive hit to system performance if not used properly

    • Measurements show that we use approximately 160 cycles for Lock Acquires and 140 cycles for Lock Releases.

    • Spinlocks should be used to protect data and not code.

  • Locking at DPC Level

    • When at DPC level, avoid extra code by using the following:

      • NdisDprAcquireSpinlock

      • NdisDprReleaseSpinlock

  • Reader-Write Locks

    • To minimize the number of spinlock acquire and release operations, use the NDIS ReadWriteLock functions for scalability:

      • NdisInitializeReadWriteLock

      • NdisAcquireReadWriteLock

      • NdisReleaseReadWriteLock

    • The Read-Write Locks allow multiple concurrent readers to use a single lock and limit write access to a single writer thread. No read access is allowed during a write access. They will still behave like a spinlock and raise the IRQL to dispatch when acquired.


Auto tuning network drivers l.jpg

Auto Tuning Network Drivers

  • Static: Driver and NIC hardware parameters are based on system configuration such as whether it is a client or server machine, CPU, memory, and what can the NIC do.

  • Dynamic: System conditions dictate what type of tuning is necessary for optimum performance. It uses resource utilization and network load as metrics for determining the best operating points for the NIC and driver.

  • Some of the primary auto tuning parameters include:

    • Interrupt moderation

    • Receive Buffers allocation

    • Small buffer coalescing

    • Packets processed per DPC

  • Drivers can obtain current processor utilization by using the NdisGetCurrentProcessorCounts function.


Hardware optimizations l.jpg

Hardware Optimizations


Task offload support l.jpg

Task Offload Support

  • Checksum Offload

    • It has shown to improve overall TCP/IP performance by up to 20%

    • It improves caching effect and eliminates churning – 8%increase

    • It reduces code path length – 12% improvement

  • TCP Segmentation Offload

    • It has shown to improve overall TCP/IP performance by up to 11%

    • Reduces sender Cycles per Byte cost by 2x (it goes below 1.5)

    • NDIS 6.0 has support for successor: Giant Send Offload (> 64K)

    • NDIS 6.0 has IPv6 support for TCP Segmentation Offload

  • NDIS 6.0 offers support for IPSec Offload


Message signaled interrupts msi l.jpg

Message Signaled Interrupts (MSI)

  • MSI has the following attributes:

    • No acknowledgment is necessary for the message

    • No sharing is usually necessary

    • There is support for many interrupts per PCI function

    • Caveat: It only works on P4 and later chipsets

  • Advantages of MSI

    • With no sharing in place, latency is less with a single ISR running

    • Bus utilization goes down by eliminating some read operations from device

    • Device can target interrupts at designated processors (e.g. RSS)

    • It guarantees data buffer coherency because message follows DMA traffic on bus


Receive side scaling rss l.jpg

Receive Side Scaling (RSS)

  • Existing stack limits receive processing to one CPU

    • Restricts scalability of Web server to the number of short-lived connections a single CPU can process (per NIC)

    • Limits transaction throughput to packet receive processing rate of one CPU

    • Example: A four processor machine can not use more than 25% of its overall CPU cycles when hosting a single NIC on the system

  • RSS helps both long and short-lived connections

    • At times when CPU processing is dominated by connection setup, RSS improves performance

    • Connection setup tasks map well to a general purpose CPU

  • RSS gives us parallel receive processing = parallel DPCs

  • Planned availability in Windows Server 2003 Network Scalable Pack Add-on and Longhorn


Receive side scaling l.jpg

Receive Side Scaling

NDIS

NDIS

Today

Receive Side Scaling

NDIS

NDIS

CPU0

CPU1

CPU2

Parallel

DPC

CPU0

DPC

DPC

DPC

DPC

ISR

Parallel

ReceivePacket

Queues

NIC

NIC

One processor per NIC Multiple processors per NIC


Network performance tools l.jpg

Network Performance Tools

  • NTttcp benchmark

    • Uses Winsock 2.x publicly available APIs

    • Uses Overlapped I/O and Multithreading model

    • Transfers random data from Memory to Memory

    • Provides Throughput, CPU, and Interrupt rate

    • Provides Cycles per Byte metric - key for measuring performance to catch regressions

    • Provides Packet to ACK ratio to detect link condition

    • Provides number of Segment Retransmits and Errors

    • Supports all Windows hardware architectures


Ntttcp output for a single thread l.jpg

NTttcp Output for a Single Thread


Ntttcp output for multiple threads l.jpg

NTttcp Output for Multiple Threads


More network performance tools l.jpg

More Network Performance Tools

  • Kernrate Profiling tool

    • General purpose profiler for tracking CPU utilization

    • Samples periodically (programmable) to see what is executing

    • Adjustable granularity

      • Per-processor, per-process, and total

    • Supports all Windows hardware architectures

    • Supports Windows 2000 and beyond

    • Highly customizable (numerous options)

    • The profiling tool and its viewer (KrView) can be downloaded from:

      • http://www.microsoft.com/whdc/system/sysperf/krview.mspx


Call to action l.jpg

Call To Action

  • NDIS 6.0 driver developers need to implement Task Offloading support for IPv6

  • Fine-tune your hardware so it operates at its optimal performance point

  • Fine-tune your network driver to work optimally with your hardware for best performance

  • For questions, please e-mailndis6fb @ microsoft.com. Please include your name, company name, and phone number


Additional resources l.jpg

Additional Resources

  • Email: ndis6fb @ microsoft.com

  • Web Resources:

    • Analyzing Driver Performance: http://www.microsoft.com/whdc/driver/perform/drvperf.mspx

    • High Performing Adapters and Drivers whitepaper: http://www.microsoft.com/whdc/device/network/NetAdapters-Drvs.mspx

    • Kernrate is available for download from the following:

      http://www.microsoft.com/whdc/system/sysperf/krview.mspx


Slide20 l.jpg

© 2005 Microsoft Corporation. All rights reserved.

This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.


  • Login