Windows 2008 Failover Clustering Witness/Quorum Models

Windows 2008 Failover Clustering Witness/Quorum Models

Node Majority* Recommended for clusters with odd number of nodes • Can sustain failures of half the nodes (rounding up) minus one • Ex: a seven node cluster can sustain three node failures. • No concurrently accessed disk required • Disks/LUNs in Resource Groups are not concurrently accessed • Cluster stops if majority of nodes fail • Requires at least 3 nodes in cluster • Inappropriate for automatic failover in geographically distributed clusters Still deployed in environments that want humans to decide service location Cluster Status * Formerly “Majority Node Set”

Node Majority with Witness Disk* Recommended for clusters with even number of nodes • Can sustain failures of half the nodes (rounding up) if the disk witness remains online • Ex: a six node cluster in which the disk witness is online could sustain three node failures • Can sustain failures of half the nodes (rounding up) minus one if the disk witness fails • Ex: a six node cluster with a failed disk witness could sustain two (3-1=2) node failures. • Witness disk is concurrently accessed by all nodes • Acts as tiebreaker Witness disk can fail without affecting cluster operations • Usually used in 2-node clusters, or some geographically dispersed clusters • Can work with SRDF/CE • 64 clusters/VMAX pair limit • Can work with VPLEX • Does not work with: • RecoverPoint • MirrorView Cluster Status * Formerly “quorum” disk

Witness Disk Not generally recommended • Can sustain failures of all nodes except one • Loss of witness disk stops the cluster • Original “Legacy” cluster model for Windows until the introduction of Majority Node Set • Witness disk is the only voter in the cluster • Failure of the witness leads to failure of the cluster Cluster Status

Node Majority with File Share Witness (FSW) Recommended for most Geographically Distributed Clusters • Can sustain failures of half the nodes (rounding up) if the FSW remains online • Ex: a six node cluster in which the disk witness is online could sustain three node failures • Can sustain failures of half the nodes (rounding up) minus one if the FSW fails • Ex: a six node cluster with a failed disk witness could sustain two (3-1=2) node failures. • Any CIFS share will work • FSW is not a member of the cluster • One host can serve multiple clusters as a witness • FSW placement is important • Third failure domain • Or FSW itself can be made to automatically fail over Timing issues can be a challenge • Works with no node limitations on: • SRDF/CE • MV/CE • RP/CE • VPLEX Cluster Status

Why is Geographically Distributed Clustering a special case? • A two site configuration will always include a failure scenario that will result in majority loss • This is often desired behavior • Sometimes desirable to have humans control the failover between sites • Failover is automated, but not automatic • Simple to restart the services on surviving nodes (force quorum) • net start clussvc /fq • If automaticfailover between sites is required, deploy a FSW in a separate failure domain (third site)

Things to note • Successful failover requires all disks in the resource group (RG) be available to the production node, including disks, requiring: • Replication between sites • A method to surface the replicated copies to the nodes in the DR site (Cluster Enabler), OR • A virtualization technique whereby the replicated LUNs are always available to the nodes in the DR site (VPLEX)

Multi-site configurations

Quorum Recommendations http://technet.microsoft.com/en-us/library/cc770830(WS.10).aspx#BKMK_multi_site

FSW failure scenarios 3-site configuration – odd # voters Cluster Status

FSW failure scenarios Even # voters Cluster Status

FSW Failure Scenarios 2-site configuration – even # voters Cluster Status

FSW Failure Scenarios 2-site configuration – odd # voters Cluster Status

Node Weights • Nodes can be altered to have no vote • cluster . node <NodeName> /prop NodeWeight=0 • Useful when you have an unbalanced configuration • Hotfix required • http://support.microsoft.com/kb/2494036

Dealing with loss of quorum • It’s an outage that requires manual intervention to recover • net start clussvc /fq • Cluster returns “unforced” when a majority of nodes come back online • Does not alter RPO of user data • Rolling failures may result in reversion of cluster configuration • FSW does not store cluster database (same behavior as disk witness)

FSW Considerations • FSW can be SMB 1.0 • No need to be same OS as member nodes • Same domain, same forest • Cannot be a member of the cluster • Can be hosted on NAS (VNX) • 5 MB of free space • If Windows, server should be dedicated to FSW • 1 server can be witness to multiple clusters • Beware of dependencies • Administrator must have full control share and NTFS perms • No DFS

Changing the quorum configuration Process for FSW • No downtime requirement • As long as there are a majority of nodes available • Account requirements • Administrators group on each node member • Domain user account • Create FSW • Start Failover Cluster Manager

Windows 2008 Failover Clustering Witness/Quorum Models

Windows 2008 Failover Clustering Witness/Quorum Models

Presentation Transcript

Windows 2000 Basics

Social Behaviors and Bacterial Quorum Sensing

The Curriculum: models

Windows 7 and Windows Server 2008 R2 Kernel Changes

Fuzzy C-Means Clustering

Windows Server 2008 R2

Hands-On Microsoft Windows Server 2008

Windows A z ure

Graphical Models for the Internet

Graph P artitioning a nd Clustering for Community Detection

The witness of the church in Judea and Samaria 8:1 – 12:25

Clustering and NLP

What’s New in Fireware XTM 11.7

MCTS Guide to Microsoft Windows 7

Windows OS Principles

OBJECTIVES