1 / 36

Developing Highly Available Multipath Solutions and Device-Specific Modules

Developing Highly Available Multipath Solutions and Device-Specific Modules. Jaivir Aithal Senior Software Development Engineer Device & Storage Technologies jaivira@microsoft.com. AGENDA. Microsoft Multipath IO (MPIO) Deployment and Configuration

apollo
Download Presentation

Developing Highly Available Multipath Solutions and Device-Specific Modules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing Highly Available Multipath Solutions and Device-Specific Modules • Jaivir Aithal • Senior Software Development Engineer • Device & Storage Technologies • jaivira@microsoft.com

  2. AGENDA • Microsoft Multipath IO (MPIO) Deployment and Configuration • Key Enhancements for Windows Server 2008 R2 • Configuration in the absence of storage • Performance optimizations • Health monitoring • Best Practices for MPIO tuning • Registry settings • Tips & Tricks for Device-Specific Module (DSM) writers • How to get a DSM to best work with MPIO’s management UI • Lessons learned through the Microsoft DSM (MSDSM) • Common pitfalls for DSM writers and tips for how to address them.

  3. MPIO Deployment and Configuration • MPIO Optional Component (OC) • Using dism.exe • dism /online /quiet /enable-feature:MultipathIo • Claiming DSM Support • Using MSDSM vs. Vendor DSM • SPC-3 compliance • Migration requirements • Registry restrictions • HKLM\System\CurrentControlSet\Services\<DSM>\Parameters • x86 vs. x64 • System class vs. SCSI Adapter class • Driver signing

  4. DSM Installation Determine OS Server 2008 or upwards Y Use HardwareIDRoot\MPIO N MPIO installed? Windows Server 2008? N N Use HardwareIDDetected\MPIO Enable Optional Component using DISM Y Y Install MPIO, DSM & MPDEV Enable Optional Component using PKGMGR Restart storage stack Install only the DSM

  5. Enabling Pre-Configuration • Problem Definition • Ability to configure multipath settings without requirement for external storage to be physically attached • Scenarios • Datacenter automation (preconfigure servers, connect storage later) • Configuration utility that sets tunables • Management utility that sets operation settings • Architecture changes • WMI registration by MPIO Control object (FDO) • WMI registration piggy-backing on pseudo-LUN (PDO) • Supported only on Windows Server 2008 R2 and upwards

  6. DSM Changes Required • Implementation Details • MOF changes • Distinguish DSM-centric classes from Device-centric ones • Split WMI classes into two files to avoid common mistakes • Generate the binary data during compile time • Remember to specify the resource name of the new binary MOF • Registration details • Update DsmType to DsmType5 • Pass the structure size as the size of the updated DSM_INIT_DATA • Specify DSM-centric WMI GUIDs using DsmWmiGlobalInfo • Continue specifying Device-centric GUIDs using DsmWmiInfo

  7. DSM-centric MOF example – msdsmdsm.mof • // This is information that should be available even if no storage is physically present. • // • // Example: Supported devices list class. • [WMI, • Dynamic, • Provider("WmiProv"), • Description("Retrieve MSDSM's supported devices list.") : amended, • Locale("MS\\0x409"), • guid("{c362d67c-371e-44d8-8bba-044619e4f245}")] • class MSDSM_SUPPORTED_DEVICES_LIST • { • [key, read] string InstanceName; • [read] boolean Active; • [WmiDataId(1), read, Description("Number of supported devices.") : amended] uint32 NumberDevices; • [WmiDataId(2)] uint32 Reserved; • [WmiDataId(3), • read, • MaxLen(31), • Description("Array of device hardware identifiers.") : amended, • WmiSizeIs("NumberDevices") • ] string DeviceId[]; • };

  8. Device-centric MOF example – msdsm.mof • // This is information that pertains to a specific instance of the device. Here’s an example: • // • // Embedded basic-statistics class. • [WMI, guid("{a34d03ec-6b0b-46a1-9178-82525f41133f}")] • class MSDSM_DEVICEPATH_PERF • { • [WmiDataId(1)] uint64 PathId; • [WmiDataId(2)] uint32 NumberReads; • [WmiDataId(3)] uint32 NumberWrites; • [WmiDataId(4)] uint64 BytesRead; • [WmiDataId(5)] uint64 BytesWritten; • }; • // Statistics provider class • [WMI, Dynamic, Provider("WmiProv"), Description("Retrieve MSDSM Performance Information.") : amended, Locale("MS\\0x409"), guid("{875b8871-4889-4114-93f6-cd064c001cea}")] • class MSDSM_DEVICE_PERF • { • [key, read] string InstanceName; • [read] boolean Active; • [WmiDataId(1), read, Description("Number of paths.") : amended] uint32 NumberPaths; • [WmiDataId(2), read, Description("Array of Performance Information per path for the device.") : amended, • WmiSizeIs("NumberPaths“)] MSDSM_DEVICEPATH_PERF PerfInfo[]; • };

  9. DSM WMI Registration • typedefstruct _DSM_INIT_DATA { • // Size, in bytes. • ULONG InitDataSize; • // DSM entry points. • DSM_INQUIRE_DRIVER DsmInquireDriver; • . . . • DSM_BROADCAST_SRB DsmBroadcastSrb; • // Wmi entry point and guid information. • DSM_WMILIB_CONTEXT DsmWmiInfo; • // Version 2 starts here... • DSM_TYPE DsmType; • . . . • // Version 5 starts here... • // Wmi entry point and guid information for DSM-centric classes. • DSM_WMILIB_CONTEXT DsmWmiGlobalInfo; • } DSM_INIT_DATA, *PDSM_INIT_DATA;

  10. DSM-centric WMI Registration • // DsmTypeUnknown == mustn't be used. • // DsmType1 == first version • // DsmType2 == indicates that DSM uses InterpretErrorEx() and handles WMI calls with • // DSM_IDS passed in as extra parameter • // DsmType3 == indicates that DSM handles cases where completion routine can be called with NULL DsmId • // DsmType4 == indicates that DSM provides version info • // DsmType5 == indicates that DSM provides additional DSM-centric (global) WMI classes • // DsmType6 == not used • typedefenum _DSM_TYPE { • DsmTypeUnknown = 0, • DsmType1, • DsmType2, • DsmType3, • DsmType4, • DsmType5, • DsmType6 • } DSM_TYPE, *PDSM_TYPE; • #define DSM_INIT_DATA_TYPE_1_SIZE (RTL_SIZEOF_THROUGH_FIELD(DSM_INIT_DATA, Reserved)) • #define DSM_INIT_DATA_TYPE_2_SIZE (RTL_SIZEOF_THROUGH_FIELD(DSM_INIT_DATA, DsmType)) • #define DSM_INIT_DATA_TYPE_3_SIZE DSM_INIT_DATA_TYPE_2_SIZE • #define DSM_INIT_DATA_TYPE_4_SIZE (RTL_SIZEOF_THROUGH_FIELD(DSM_INIT_DATA, DsmVersion)) • #define DSM_INIT_DATA_TYPE_5_SIZE (sizeof(DSM_INIT_DATA))

  11. Performance Enhancements • Improvements in Core MPIO stack • Elimination of unnecessary use of spinlocks • Conversion of a spinlock into a reader-writer lock • Minimizing unnecessary memory write operations • Re-laying members of a data structure to minimize CPU reads • MSDSM enhancements • Make gathering statistics optional • Eliminate unnecessary use of processor-intensive operations • New load balance policy: Least Blocks • Performance Gains in the MPIO stack (i.e. mpio.sys and msdsm.sys) • Preliminary results indicate up to 15% improvement on certain configuration under certain loads. (However, pre-Beta and Beta builds might not indicate what is expected for RTM performance numbers.)

  12. MPIO Health Monitoring • Common Interface for basic statistical data • Querying interface is WMI • Granularity at three levels • LUN • Path • Device Instance (i.e. LUN-Path pairing) • Health packets maintained even after monitored entity has gone offline • Potential advantages • Improve diagnosability • Reduce DSM’s overhead for maintaining these counts • Consumers can implement custom triggers • Consistent interface for management applications, regardless of underlying DSM

  13. MPIO Health Monitoring WMI event Reads Consumer A Writes MPIO Path Failures IO Errors Consumer B WMI event Retries

  14. Health Monitoring WMI Class For LUN • // Embedded Disk Health Class • [WMI, guid("{6453c476-0499-42ab-9825-5133282b0b56}")] • class MPIO_DISK_HEALTH_CLASS • { • [WmiDataId(1), read, Description("Number of read requests sent to this device.") : amended] uint64 NumberReads; • [WmiDataId(2), read, Description("Number of write requests sent to this device.") : amended] uint64 NumberWrites; • [WmiDataId(3), read, Description("Cumulative number of bytes read by requests sent to this device.") : amended] uint64 NumberCharsRead; • [WmiDataId(4), read, Description("Cumulative number of bytes written by requests sent to this device.") : amended] uint64 NumberCharsWritten; • [WmiDataId(5), read, Description("Number of requests sent to this device that were retried.") : amended] uint64 NumberRetries; • [WmiDataId(6), read, Description("Number of requests sent to this device that failed.") : amended] uint64 NumberIoErrors; • [WmiDataId(7), read, Description("System time at which this health packet was created for this device.") : amended] uint64 CreateTime; • [WmiDataId(8), read, Description("Number of path failures experienced by this device.") : amended] uint64 PathFailures; • [WmiDataId(9), read, Description("System time at which this device went offline/failed.") : amended] uint64 FailTime; • [WmiDataId(10), read, Description("Flag that indicates if the device is offline/failed.") : amended] booleanDeviceDisabled; • [WmiDataId(11), read, Description("Count of the number of times that the NumberReads field wrapped.") : amended] uint8 NumberReadsWrap;

  15. Health Monitoring WMI Class For LUN – Contd. • [WmiDataId(12), read, Description("Count of the number of times that the NumberWrites field wrapped.") : amended] uint8 NumberWritesWrap; • [WmiDataId(13), read, Description("Count of the number of times that the NumberCharsRead field wrapped.") : amended] uint8 NumberCharsReadWrap; • [WmiDataId(14), read, Description("Count of the number of times that the NumberCharsWritten field wrapped.") : amended] uint8 NumberCharsWrittenWrap; • [WmiDataId(15), read] uint8 Pad1[3]; • }; • // Provider Health Information Class • [WMI, Dynamic, Provider("WmiProv"), Description("MPIO Psuedo-LUN Health Information.") : amended, • Locale("MS\\0x409"), guid("{ef04568a-782b-443c-a3db-966ab43775f9}")] • class MPIO_DISK_HEALTH_INFO • { • [key, read] string InstanceName; • [read] boolean Active; • [WmiDataId(1), read, Description("Number of Psuedo-LUN Health Packets.") : amended] uint32 NumberPlPackets; • [WmiDataId(2), read, Description("Reserved for future use.") : amended] uint32 Reserved; • [WmiDataId(3), read, Description("MPIO Pseudo-LUN Health Info Array.") : amended, • WmiSizeIs("NumberPlPackets“)] MPIO_DISK_HEALTH_CLASS PlHealthPackets[]; • };

  16. Health Monitoring WMI Classes – Path & Device Instance • Path Health Information • Embedded class: MPIO_PATH_HEALTH_CLASS • Provider class: MPIO_PATH_HEALTH_INFO • Device Instance Health Information • Embedded class: MPIO_DEVINSTANCE_HEALTH_CLASS • Provider class: MPIO_DEVINSTANCE_HEALTH_INFO • Health Packet Cleanup • Registry value: FlushHealthInterval • Default: 24 hours • Turning OFF Health Monitoring • Registry value: GatherHealthStats • Default: TRUE (i.e. ON)

  17. MPIO Health Reporting – Example 1 Path Health, Disk (pseudo-LUN) Health and DeviceInstance Health Statistics

  18. MPIO Health Reporting – Example 2 Health Statistics output after the user-specified “Health Flush” period has expired and the “orphan” Health packets (associated with failed path 000000077030001 have been discarded.

  19. MPIO Configuration Snapshot • Uses existing WMI classes • Exports the existing MPIO configuration to a text file • Can be used by administrators for troubleshooting • Can be used by DSM writers during development and testing phases • Information written to a file in reverse chronological order (i.e. history maintained) • Default output file used: HKLM\System\CurrentControlSet\Services\mpio\Parameters, DefaultConfigOutputFile

  20. MPIO Tunables Timer MIN MAX DEFAULT PathVerifyEnabled FALSE TRUE FALSE PathVerificationPeriod 0 MAXULONG 30s RetryCount 0 500 3 RetryInterval 0 MAXULONG 1s PDORemovePeriod 0 MAXULONG 20s Application IRP NTFS DISK DISK DISK Number of times retired <= RetryCount IRP InterpretError returns Retry = TRUE When PDORemovePeriod expires LUN continues residing in memory, waiting for a path to come back online IRP IRP PathVerify PathVerify IRP IRP IRP When PathVerificationPeriod expires B LUN LUN A DSM LUN PathVerifyEnabled DsmID(0) A, B HBA 1 HBA 0 DsmID(1) MPIO PCI PCI PNP LUN Adapter 0 Adapter 1

  21. DSM Tips & Tricks

  22. Getting a DSM To Work With MPIO UI // List of supported GUIDs GUID DSM_QuerySupportedLBPoliciesV2GUID = DSM_QuerySupportedLBPolicies_V2Guid; ... #define DSM_QuerySupportedLBPoliciesV2GUID_Index 0 ... WMIGUIDREGINFO DsmGuidList[] = { {&DSM_QuerySupportedLBPoliciesV2GUID, 1, 0}, ... }; NTSTATUS DriverEntry(. . .) { DSM_INIT_DATA dsmInitData; // Get DSM’s version information DsmpGetVersion(&dsmInitData.DsmVersion); // Set-up the init data dsmInitData.InitDataSize = DSM_INIT_DATA_TYPE_4_SIZE; dsmInitData.DsmType = DsmType4; ... // Send the IOCTL to mpio.sys to register. DsmSendDeviceIoControlSychronous(IOCTL_MPDSM_REGISTER, ..., *dsmInitData); ... }

  23. Getting MPIO UI To Restrict Allowable Path States // List of supported GUIDs GUID DSM_QueryLBPolicyV2GUID = DSM_QueryLBPolicy_V2Guid; ... #define DSM_QueryLBPolicyV2GUID_Index 0 ... WMIGUIDREGINFO DsmGuidList[] = { {&DSM_QueryLBPolicyV2V2GUID, 1, 0}, ... }; NTSTATUS DsmQueryData(...) { ... if (GuidIndex == DSM_QueryLBPolicyV2GUID_Index) { PDSM_Load_Balance_Policy_V2 LBPolicy = &(((PDSM_QueryLBPolicy_V2)Buffer)->LoadBalancePolicy); for (ULONG inx = 0; inx < DsmIds->Count; inx++) { LBPolicy->DSM_Paths[index]->Reserved = DSM_STATE_ACTIVE_OPTIMIZED_SUPPORTED; // Depending on supported states, OR them in if (activeUnoptimizedSupported) { LBPolicry->DSM_Paths[index]->Reserved |= DSM_STATE_ACTIVE_UNOPTIMIZED_SUPPORTED; } } ... }

  24. Ensuring DSM Works With Virtual Disk Service // List of supported GUIDs GUID DSM_QuerySupportedLBPoliciesV2GUID = DSM_QuerySupportedLBPolicies_V2Guid; GUID DSM_QueryDsmUniqueIdGUID = DSM_QueryUniqueIdGuid; ... #define DSM_QuerySupportedLBPoliciesV2GUID_Index 0 #define DSM_QueryDsmUniqueIdGUID_Index 1 ... WMIGUIDREGINFO DsmGuidList[] = { {&DSM_QuerySupportedLBPoliciesV2GUID, 1, 0}, {&DSM_QueryDsmUniqueIdGUID, 1, 0}, ... }; NTSTATUS DsmQueryData(...) { ... if (GuidIndex == DSM_QueryDsmUniqueIdGUID_Index) { PDSM_QueryUniqueIddsmQueryUniqueId = Buffer; // Ensure that the 64-bit returned value will be unique dsmQueryUniqueId->DsmUniqueId = (ULONGLONG)((ULONG_PTR)DsmContext); } ... }

  25. Avoiding Immediate LUN Tear-down Post-Initialization NTSTATUS DsmSetDeviceInfo( __in IN PVOID DsmContext, __in IN PDEVICE_OBJECT TargetObject, __in IN PVOID DsmId, __inout IN OUT PVOID *PathId ) { PDSM_DEVICE_INFO deviceInfo = DsmId; PSCSI_ADDRESS scsiAddress = deviceInfo->ScsiAddress; // It is possible that Port, Bus and Target are all zero // Ensure that the returned PathId is never zero (since MPIO // will treat that as NULL) pathId = DSM_PATHID_PREFIX; pathId <<= 8; pathId |= scsiAddress->PortNumber; // Port pathId <<= 8; pathId |= scsiAddress->PathId; // Bus pathId <<= 8; pathId |= scsiAddress->TargetId; // Target *PathId = ((PVOID)((ULONG_PTR)(pathId))); ... return status; }

  26. Avoiding Bogus Path Flagging On Path Recovery BOOLEAN DsmIsPathActive(...) { ... // Set a flag that IsPathActive was successfully called deviceInfo->Usable = TRUE; return TRUE; } PVOID DsmLBGetPath(...) { for (inx = 0; inx < DsmList->Count; inx++) { deviceInfo = DsmList->IdList[inx]; // Don’t consider paths that aren’t yet usable if (deviceInfo->Usable == FALSE) continue; // Find the best candidate to return, even if not in A/O // Prefer: Active/Unoptimized > StandBy > Unavailable pathId = DsmpCheckIfIsBetterCandidatePath(deviceInfo,...); } return pathId; } ULONG DsmInterpretErrorEx(...,PBOOLEAN Retry,PLONGRetryInterval){ // If SenseData indicates non-A/O path was chosen, retry IO if (addSenseQ == 0xA || addSenseQ == 0xB || addSenseQ == 0xC){ *Retry = TRUE; *RetryInterval = ALUA_STATE_CHANGE_TIME_TAKEN; } ... }

  27. Handling IO In The Absence of Active/Optimized Path PVOID DsmLBGetPath(...) { ... // Find the best candidate to return, even if not in A/O return DsmpFindBestCandidatePath(...); } ULONG DsmInterpretErrorEx(...,PBOOLEAN Retry,PLONGRetryInterval,...) { // If SenseData indicates access state changed, or implict // transition failed, or TPG in non-active state, retry IO if((sKey==0x6 && addSn==0x2A && (aSQ==0x6 || aSQ==0x7)) || (sKey==2 && addSn==4 && (aSQ==0xA || aSQ==0xB || aSQ==0xC))){ sendTPG = TRUE; *Retry = TRUE; *RetryInterval = ALUA_STATE_CHANGE_TIME_TAKEN; errorMask = DSM_RETRY_DONT_DECREMENT; } // Send an RTPG asynchronously to get updated TPG states // If explicit-only transitions supported, this routine will // send an STPG first to make one of the TPGs Active/Optimized if (sendTPG) DsmpSetPathForIoRetryALUA(...); ... return errorMask; }

  28. Reducing ALUA Storage Device Initialization Time NTSTATUS DsmInquire(...) { PDSM_DEVICE_INFO deviceInfo; // Represents this DeviceInstance ... // For ALUA storage, get the Target Port Groups (TPG) info status = DsmpReportTargetPortGroups(TargetDevice, ...); if (NT_SUCCESS(status) deviceInfo->IgnorePathVerify = TRUE; ... return status; } NTSTATUS DsmPathVerify(...) { ... // If storage is ALUA, and this is the first time PathVerify // is being called, we may be able to skip doing it if(deviceInfo->ALUASupport != DSM_DEVINFO_ALUA_NOT_SUPPORTED){ if (deviceInfo->IgnorePathVerify == TRUE) { status = STATUS_SUCCESS; // From now on, we should send PathVerify if asked to deviceInfo->IgnorePathVerify = FALSE; } } ... return status; }

  29. Avoid Preventing Cluster Disk Resource Coming Online ULONG DsmCategorizeRequest(...) { if (DsmpReservationCommand(Irp, Srb)) return DSM_WILL_HANDLE; ... } NTSTATUS DsmSrbDeviceControl(...) { if (opCode == SCSIOP_PERSISTENT_RESERVE_OUT) ( status = DsmpPersistentReserveOut(...); } ... } NTSTATUS DsmpPersistentReserveOut(...) { if (serviceAction == RESERVATION_ACTION_RESERVE) { __RetryRequest: status = DsmSendRequest(...); if (!NT_SUCCESS(status) { if (Srb->SrbStatus & SRB_STATUS_AUTOSENSE_VALID && Srb->SrbStatus & SRB_STATUS_ERROR && Srb->ScsiStatus == SCSISTAT_CHECK_CONDITION) { // check if the error is retry-able if (DsmpShouldRetryPRcommand(senseData)) { goto __RetryRequest; } } } } ... }

  30. Ensuring DSM Can Be Uninstalled Using MPIOCPL ... [Contoso_Install.Services] AddService=contosodsm,%SPSVCINST_ASSOCSERVICE%,Contosodsm_Service [Contosodsm_Service] ... AddReg = Contosodsm_Addreg [Contoso_Addreg] HKR, Parameters, DsmSupportedDeviceList, %REG_MULTI_SZ%,\ "Vendor 8Product 16" ; The following cannot be grouped (as above) HKLM, SYSTEM\CurrentControlSet\Control\MPDEV,\ MPIOSupportedDeviceList, %REG_MULTI_SZ_APPEND%, "Vendor 8Product 16" ; Uninstall Section [DefaultUninstall] DelReg = Contosodsm_Delreg [DefaultUninstall.Services] DelService = contosodsm [Contosodsm_Delreg] HKLM, SYSTEM\CurrentControlSet\Control\MPDEV, MPIOSupportedDeviceList, %REG_MULTI_SZ_DELETE%, "Vendor 8Product 16“

  31. Ensuring DSM Is Presented a Device before MSDSM NTSTATUS DriverEntry(...) { DSM_INIT_DATA dsmInitData; ... // Ensure this DSM is presented the device before MSDSM dsmInitData.Reserved = 0; ... // Send dsmInitData to mpio.sys via the IOCTL to register. DsmSendDeviceIoControlSynchronous(IOCTL_MPDSM_REGISTER, ...); ... } <File: CONTOSODSM.INF> ... [Contoso_Install.Services] AddService=contosodsm,%SPSVCINST_ASSOCSERVICE%,Contosodsm_Service [Contosodsm_Service] ... AddReg = Contosodsm_Addreg [Contoso_Addreg] HKR, Parameters, DsmSupportedDeviceList, %REG_MULTI_SZ%,\ "Vendor 8Product 16“ ; The following cannot be grouped (as above) HKLM, SYSTEM\CurrentControlSet\Control\MPDEV,\ MPIOSupportedDeviceList, %REG_MULTI_SZ_APPEND%, "Vendor 8Product 16” ...

  32. Call To Action • Revisit existing DSM WMI classes to determine whether preconfiguration feature needs to be implemented • Assess whether any of the performance-related changes can be implemented in your DSM • Consider modifying management applications to implement new health WMI classes • Implement triggers • Implement Version 2 of the classes defined in mpioLBPo.mof • Test your storage with inbox MSDSM • Encourage adoption of SPC-3 ALUA for your storage

  33. RESOURCES • Web Resources • Microsoft Storage Technologies - Multipath I/Ohttp://www.microsoft.com/MPIO • SCSI Specifications (SPC-3), ratified versionhttp://t10.org/ftp/t10/drafts/spc3/spc3r23.pdf • Microsoft Windows Server Failover Clustering (WSFC)http://www.microsoft.com/downloads/details.aspx?familyid=75566F16-627D-4DD3-97CB-83909D3C722B&displaylang=en • Windows Management Interface on MSDNhttp://msdn.microsoft.com/en-us/library/aa394572.aspx • Contact Information (for feedback, future feature asks) • mpiopm@microsoft.com

  34. Related Sessions

  35. Questions?

More Related