winsock kernel best practices n.
Skip this Video
Loading SlideShow in 5 Seconds..
Winsock Kernel Best Practices PowerPoint Presentation
Download Presentation
Winsock Kernel Best Practices

Loading in 2 Seconds...

play fullscreen
1 / 24
Download Presentation

Winsock Kernel Best Practices - PowerPoint PPT Presentation

Download Presentation

Winsock Kernel Best Practices

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Winsock KernelBest Practices Osman N. ErtugaySoftware Design EngineerWindows Network Developer PlatformMicrosoft Corporation

  2. Session Outline • Brief Winsock Kernel (WSK) refresher • Familiarity with WSK documentation and WSK sample in WDK ensures the most benefit from this session • WSK programming guidelines and best practices • WSK registration and deregistration • I/O Request Packet (IRP) handling • Buffer ownership and manipulation • Using socket callbacks versus socket functions • Memory/throughput tradeoff in stream data transfer • Transport Address security • Dual-family sockets

  3. WSK Refresher • Kernel-mode Network Programming Interface • WSK replaces the Transport Driver Interface (TDI) for “consumers” of TDI (i.e., TDI clients) • WSK is not a “provider” interface for new transport development • WSK goals/benefits • Easier to use, consistent API • Higher performance, better scalability • Better fit for the Next Generation TCP/IP Stack • Similar to Winsock2, but not the same • Easy to port to for existing TDI clients

  4. WSK Refresher User-mode Kernel-mode WSK Client Driver WSK_SOCKET WSK_CLIENT ClientCallbacks SocketCallbacks WSKRegistration ClientFunctions SocketFunctions WSK WSKRegistrationLibrary I/O Manager WSK Subsystem NetworkModuleRegistrar(NMR) ... TCP (IPv6/IPv4) UDP (IPv6/IPv4) Raw (IPv6/IPv4)

  5. WSK Programming Guidelines And Best Practices

  6. Use the new WSK registration library: WskRegister WskDeregister WskCaptureProviderNPI WskReleaseProviderNPI WSK Registration And Deregistration const WSK_CLIENT_DISPATCH WskSampleClientDispatch = { MAKE_WSK_VERSION(1, 0), // WSK version 1.0 0, // Reserved NULL // No WskClientEvent callback in WSK version 1.0 }; WSK_REGISTRATION WskSampleRegistration; NTSTATUS DriverEntry(. . .) { NTSTATUS status; WSK_CLIENT_NPI wskClientNpi; . . . wskClientNpi.ClientContext = NULL; wskClientNpi.Dispatch = &WskSampleClientDispatch; status = WskRegister(&wskClientNpi, &WskSampleRegistration); . . . } Network Module Registrar APIs still available

  7. Capture the WSK_PROVIDER_NPI, use it, release it Do NOT use the WSK_PROVIDER_NPI after releasing it WaitTimeOut usage in WskCaptureProviderNPI WSK_NO_WAIT WSK_INFINITE_WAITDo NOT use if calling from DriverEntry! WSK Registration And Deregistration NTSTATUS SomeWorkerRoutine(. . .) { NTSTATUS status; WSK_PROVIDER_NPI wskProviderNpi; . . . status = WskCaptureProviderNPI(&WskSampleRegistration, WSK_INFINITE_WAIT, &wskProviderNpi); if(NT_SUCCESS(status)) { status = wskProviderNpi.Dispatch->WskSocket( wskProviderNpi.Client, AF_INET6, . . .); WskReleaseProviderNPI(&WskSampleRegistration); } . . . }

  8. WskDeregister Must be called exactly once for each successful WskRegister when WSK client stops using WSK Will block until All captured provider NPI instances are returned All outstanding calls to provider NPI functions completed All sockets are closed Must close all sockets and release all captured provider NPI instances for WskDeregister to return Will cause WskCaptureProviderNPI calls waiting in other threads (with WSK_INFINITE_WAIT or some timeout) to return WSK Registration And Deregistration VOID DriverUnload(. . .) { . . . WskDeregister(&WskSampleRegistration); . . . }

  9. WSK Subsystem WSK Client IO Manager IoAllocateIrp(1, …) IoSetCompletionRoutine(Irp, CompletionRoutine, Context, TRUE, TRUE, TRUE) WskSend(Socket, …, Irp) CompletionRoutine(…, Irp, Context) IoCompleteRequest(Irp, …) STATUS_MORE_PROCESSING_REQUIRED IoFreeIrp(Irp) IoReuseIrp(Irp, …) IRP Handling

  10. Simple example that waits for IRP completion synchronously (Also demonstrating how to distinguish and optimize for “inline” IRP completion) IRP Handling NTSTATUS SyncIrpCompRtn(PDEVICE_OBJECT Reserved, PIRP Irp, PVOID Context) { PKEVENT compEvent = (PKEVENT)Context; if(Irp->PendingReturned) KeSetEvent(compEvent, 2, FALSE); return STATUS_MORE_PROCESSING_REQUIRED; } NTSTATUS SetSocketOption(PWSK_SOCKET Socket, . . .) { NTSTATUS status; CONST WSK_PROVIDER_BASIC_DISPATCH *dispatch = Socket->Dispatch; KEVENT compEvent; PIRP irp; KeInitializeEvent(&compEvent, SynchronizationEvent, FALSE); irp = IoAllocateIrp(1, FALSE); if(irp == NULL) return STATUS_INSUFFICIENT_RESOURCES; IoSetCompletionRoutine(irp, SyncIrpCompRtn, &compEvent, TRUE, TRUE, TRUE); status = dispatch->WskControlSocket(Socket, . . ., irp); if(status == STATUS_PENDING) KeWaitForSingleObject(&compEvent, Executive, KernelMode, FALSE, NULL); status = irp->IoStatus.Status; IoFreeIrp(irp); return status; }

  11. Setting up a WSK_BUF WSK_BUF.Mdl IoAllocateMdl(BufferAddress, BufferLength, . . .) MmProbeAndLockPages vs MmBuildMdlForNonPagedPool WSK_BUF.Length Must be <= (BufferLength – WSK_BUF.Offset) Buffer Ownership And Manipulation BufferAddress BufferLength MDL ByteOffset WSK_BUF.Length WSK_BUF.Offset Page Boundary WSK_BUF.Mdl • WSK_BUF.Offset • Must lie within the first MDL if WSK_BUF.Mdl points to a chain of MDLs

  12. Example: Copy data from WSK_DATA_INDICATION list to a buffer Buffer Ownership And Manipulation NTSTATUS CopyDataIndicationListToBuffer(__in PWSK_DATA_INDICATION DataIndication, __in SIZE_T BufSize, __out_bcount(BufferSize) PUCHAR Buf) { SIZE_T bytesCopied = 0; while(DataIndication != NULL) { PMDL mdl = DataIndication->Buffer.Mdl; ULONG offset = DataIndication->Buffer.Offset; SIZE_T length = DataIndication->Buffer.Length; while(length > 0 && mdl != NULL) { SIZE_T copyLength = min(length, MmGetMdlByteCount(mdl)-offset); PUCHAR sysAddr = (PUCHAR)MmGetSystemAddressForMdlSafe(mdl, LowPagePriority); if(sysAddr == NULL) return STATUS_INSUFFICIENT_RESOURCES; else if((BufSize-bytesCopied) < copyLength) return STATUS_BUFFER_TOO_SMALL; RtlCopyMemory(Buf+bytesCopied, sysAddr+offset, copyLength); offset = 0; // WSK_BUF.Offset applies only to the first MDL bytesCopied += copyLength; length -= copyLength; mdl = mdl->Next; } DataIndication = DataIndication->Next; } return STATUS_SUCCESS; }

  13. Buffer Ownership And Manipulation • May “retain” (take temporary ownership of) a WSK data indication by returning STATUS_PENDING from WskReceiveEvent or WskReceiveFromEvent callbacks • Any status other than STATUS_PENDING means data indication was NOT retained, hence no need to call WskRelease • Must release retained data indications via WskRelease • Do not retain data indications with WSK_FLAG_RELEASE_ASAP flag if possible. If you do have to retain such indications, release them within a bounded short amount of time (in the order of a few seconds)

  14. Socket Callbacks Versus Functions • Accepting incoming connections • WskAccept • Client keeps one or more accept IRPs pended in WSK • Connections rejected by WSK when no pending IRP exists • WskAcceptEvent • WSK hands over “sockets” to client for arriving connections • Client accepts or rejects • Guidance • Use WskAcceptEvent to accept as many connections as the system can handle at any given time • Use WskAccept to accept only a few fixed number of connections at any given time • WSK does not have equivalent of listen backlog in Winsock2

  15. Socket Callbacks Versus Functions • Receiving datagrams • WskReceiveFrom • Data buffer owned by client, must allocate before data arrives • Client keeps one or more receive IRPs pended in WSK • Datagrams dropped by WSK when no pending IRP exists • WskReceiveFromEvent • Data buffer owned by WSK, allocated when data arrives • Each arriving datagram handed over to client by WSK • Guidance • Always use WskReceiveFromEvent as long as you do not retain datagram indications too long • Use WskReceiveFrom only if you must always copy datagrams into your own buffers anyway • WSK does not buffer datagrams

  16. Socket Callbacks Versus Functions • Receiving stream data • WskReceive • Data buffer owned by client, must allocate before data arrives • 0-copy into client buffer possible • Data buffered by transport if no pending receive IRP exists • WskReceiveEvent • Data buffer owned by WSK • 0-copy into client buffer not possible • Data handed over to client until client rejects indication • Client needs to use WskReceive to retrieve rejected data • Guidance • Use WskReceive for large block transfers • Combined usage: Get initial data via WskReceiveEvent, then get rest of the data via WskReceive • WskReceiveEvent  Amount of retained data and the time retained must be bounded and small

  17. Socket Callbacks Versus Functions • Both socket callbacks and the IRP completions for socket functions mostly occur in Deferred Procedure Call (DPC) context • Must limit amount of processing in callback and IRP completion routines • Consider using • System worker threads for tasks that won’t last too long • Dedicated system thread for long lasting tasks

  18. Stream sockets  subject to transport flow control Send requests may remain pended until acknowledged by peer Too much pended send data  Poor memory usage Too little pended send data  Suboptimal throughput So, how much data to keep pended (Ideal send backlog: “ISB”)? As much as the network can sustain As much as the receiver can sustain Use the SIO_WSK_QUERY_IDEAL_SEND_BACKLOG IOCTL and the WskSendBacklogEvent callback Initial ISB to use SIO_WSK_QUERY_IDEAL_SEND_BACKLOG Get ISB change notifications WskSendBacklogEvent Always have two or more WskSend requests pended with ISB worth of data in total. Example: ISB = 64 K  2 WskSend requests, each with 32 K data Memory/Throughput Tradeoff

  19. Secure by default: Creating socket with NULL SecurityDescriptor and binding it to an address results in SO_EXCLUSIVEADDRUSE behavior Refrain from designing applications based on address sharing If you must allow address sharing May set SO_REUSEADDR to TRUE  Anybody else can reuse the address (not good from security perspective) May use a SecurityDescriptor  Sharing is allowed/denied based on an access check performed by the system WSK (transport) uses the SecurityDescriptor specified by the first socket and the SECURITY_SUBJECT_CONTEXT captured from the OwningProcess and OwningThread specified by the second socket to perform the access check Transport Address Security

  20. Use a single IPv6 socket to handle both IPv6 and IPv4 traffic Set the IPV6_V6ONLY option to FALSE (default is TRUE) Bind to wildcard address Dual Family Sockets // Example dual family listening socket ULONG optVal = 0; . . . status = dispatch->WskControlSocket(IPv6ListeningSocket, WskSetOption, IPV6_V6ONLY, IPPROTO_IPV6, sizeof(optVal), &optVal, 0, NULL, NULL, irp); . . . status = dispatch->WskBind(IPv6ListeningSocket, (PSOCKADDR)Ipv6WildcardAddress, 0, irp); . . . • IPv4 addresses represented in V4MAPPED IPv6 address format • Can use the INETADDR_ISV4MAPPED macro from mstcpip.h to check if a given SOCKADDR represents a V4MAPPED address

  21. Call To Action • Port your existing kernel-mode TDI applications to WSK and use WSK for new development • Move from using TDI filter drivers to WFP for network traffic interception • Follow the practices outlined in this session to achieve optimal performance and stability from WSK

  22. Additional Resources • Web Resources • Windows Network Developer Platform (WNDP) Team Blog • • WNDP Team Connect Site • Join the ‘WNDP’ program at • Related Sessions • How to Use the Windows Filtering Platform to Integrate with Windows Networking • Using NDIS 6.0, TCP Chimney Offload, and RSS to Achieve High Performance Networking • E-mail: wskapi @

  23. © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.