1 / 50

Failover Clustering & Hyper-V: Multi-Site Disaster Recovery

Failover Clustering & Hyper-V: Multi-Site Disaster Recovery. Symon Perriman Technical Evangelist Microsoft Twitter @SymonPerriman. Multi-Site Clustering. Introduction. Networking. Storage. Quorum. Defining High-Availability. But what if there is a catastrophic event?.

idola
Download Presentation

Failover Clustering & Hyper-V: Multi-Site Disaster Recovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Failover Clustering & Hyper-V: Multi-Site Disaster Recovery Symon Perriman Technical Evangelist Microsoft Twitter @SymonPerriman

  2. Multi-Site Clustering Introduction Networking Storage Quorum

  3. Defining High-Availability But what if there is a catastrophic event? High-Availability (HA) with Failover Clusteringallows applications or VMs to maintain service availability by moving them between nodes in a cluster Site A Fire, flood, earthquake…

  4. Multi-Site Clusters for Disaster Recovery • Extends a cluster from being a High-Availability solution, to also being a Disaster Recovery solution Node is located at a physically separate site Site A Site B Site B SAN VM’s are failed over to a separate physical location

  5. Benefits of a Multi-Site Cluster • Protects against loss of an entire location • Automates failover • Reduced downtime • Lower complexity disaster recovery plan • Reduces administrative overhead • Automatically synchronize application and cluster changes • Easier to keep consistent than standalone servers • Top 3 reasons disaster recovery plans fail 3. Failure detection failed – no failover 2. Poor testing – something did not work as expected 1. No automation – a dependence of people during a disaster

  6. Multi-Site Clustering Introduction Networking Storage Quorum

  7. Network Considerations • Network Deployment Options: • Stretch VLAN’s across sites • Cluster nodes can reside in different subnets Public Network Site A Site B 20.20.20.1 10.10.10.1 40.40.40.1 30.30.30.1 Redundant Network

  8. Stretching the Network • Longer distance traditionally means greater network latency • Missed inner-node health checks can cause false failover • Cluster inner-node heartbeating is fully configurable • SameSubnetDelay (default = 1 second) • Frequency heartbeats are sent • SameSubnetThreshold (default = 5 heartbeats) • Missed heartbeats before an interface is considered down • CrossSubnetDelay (default = 1 second) • Frequency heartbeats are sent to nodes on dissimilar subnets • CrossSubnetThreshold (default = 5 heartbeats) • Missed heartbeats before an interface is considered down to nodes on dissimilar subnets • PowerShell (R2): Get-Cluster | fl * • Command Line: Cluster.exe /prop

  9. Security over the WAN • Encrypt intra-node communication • 0 = clear text • 1 = signed (default) • 2 = encrypted Site A Site B 20.20.20.1 10.10.10.1 40.40.40.1 30.30.30.1

  10. Updating VM’s IP on Subnet Failover • On cross-subnet failover, if guest is… Best to use DHCP in guest OS for cross-subnet failover

  11. Client Reconnect Considerations • Nodes in dissimilar subnets • VM obtains new IP address • Clients need that new IP Address from DNS to reconnect DNS Server 2 DNS Replication DNS Server 1 Record Created Record Updated Record Obtained Record Updated 10.10.10.111 20.20.20.222 VM = 10.10.10.111 VM = 20.20.20.222 Site B Site A

  12. Solution #1: Local Failover First • Scale up for local failover for higher availability • No change in IP addresses for HA • Means not going over the WAN and is still usually preferred • Cross-site failover for disaster recovery 20.20.20.222 10.10.10.111 VM = 10.10.10.111 Site B Site A

  13. Solution #2: Stretch VLANs • Deploying a VLAN minimizes client reconnection times • IP of the VM never changes DNS Server 2 DNS Server 1 10.10.10.111 10.10.10.111 VLAN FS = 10.10.10.111 Site A Site B

  14. Solution #3: Network Device Abstraction • Network device uses 3rd IP • 3rd IP is the one registered in DNS & used by client DNS Server 2 30.30.30.30 DNS Server 1 10.10.10.111 20.20.20.222 VM = 30.30.30.30 Site A Site B

  15. Faster Failover for Multi-Subnet Clusters • RegisterAllProvidersIP(default = 0 for FALSE) • Determines if all IP Addresses for a Network Name will be registered by DNS • TRUE (1): IP Addresses can be online or offline and will still be registered • Ensure application is set to try all IP Addresses, so clients can come online quicker • HostRecordTTL(default = 1200 seconds) • Controls time the DNS record lives on client for a cluster network name • Shorter TTL: DNS records for clients updated sooner • Exchange Server 2007+ recommends a value of five minutes (300 seconds)

  16. Live Migrating Across Sites • Live migration moves a VM to another host • TCP reconnects makes the move unnoticeable to clients • Use VLAN’s to achieve live migrations between sites • IP client is connected to will not change • Plan appropriate bandwidth between sites • Live migration may require significant network bandwidth based on amount of memory allocated to VM • Migration times will naturally be longer with higher latency or lower bandwidth WAN connections

  17. CSV Networking Considerations • Cluster Shared Volumes does not support having nodes in dissimilar subnets • Use VLAN’s if you want to use CSV with multi-site clusters VLAN CSV Network Site B Site A

  18. Multi-Subnet vs. VLAN Recap Choosing the right network model for you depends on your business requirements

  19. Multi-Site Clustering Introduction Networking Storage Quorum

  20. Storage in Multi-Site Clusters • Different than local clusters: • Multiple storage arrays – independent per site • Nodes commonly access own site storage • No ‘true’ shared disk visible to all nodes Site A Site B Site B SAN

  21. Storage Considerations Site A Site B Site B Site A Replica Changes are made on Site A and replicated to Site B SAN Requires data replication mechanism between sites

  22. Hardware Replication Partners Hardware storage-based replication • EMC Cluster Enabler • SRDF /CE for DMX arrays • RecoverPoint /CE for Clariion arrays • HDS • Hitachi Storage Cluster (HSC) • HP Cluster Extension • HP StorageWorksCLX • HP LeftHand • IBM • IBM XIV Storage System • NetApp • MetroCluster • Compellent • LiveVolume

  23. Software Replication Partners Software host-based replication

  24. Appliance Replication Partners Appliance Replication • EMC • VPLEX • Datacore • SANsymphony • FalconStor • Continuous Data Protector (CDP)

  25. Synchronous Replication • Host receives “write complete” response from the storage after the data is successfully written on both storage devices Replication WriteRequest SecondaryStorage WriteComplete PrimaryStorage Acknowledgement

  26. Asynchronous Replication • Host receives “write complete” response from the storage after the data is successfully written toonly the primary storage device, then replicates later Replication WriteRequest SecondaryStorage WriteComplete PrimaryStorage

  27. Synchronous vs. Asynchronous

  28. Validation with Replicated Storage • Multi-Site clusters are not required to pass the Storage tests to be supported • Validation Guide and Policy • http://go.microsoft.com/fwlink/?LinkID=119949

  29. What about DFS-Replication? • Not supported to use the file server DFS-R feature to replicate VM data on a multi-site Failover Cluster • DFS-R performs replication on file close • Works well for Office documents like .docx, .pptx, and .xlsx • Not designed for application workloads where the file is held open, like VHD

  30. CSV with Replicated Storage • Regular cluster disks – one node accesses the disk • CSV disks - all nodes can access a disk • Which CSV disk is accessed when it appears in multiple sites? • Talk to your storage vendor for their support story Site B Site A VM attempts to access replica VHD Read/Write Read/Only

  31. Storage Virtualization Abstraction • Some replication solutions provide complete abstraction in storage array • Servers are unaware of accessible disk location • Fully compatible with Cluster Shared Volumes (CSV) Site B Site A Servers abstracted from storage Virtualized storage presents logical LUN

  32. Choosing a Stretched Storage Model Consult Vendor Consult Vendor Choosing the right model for you depends on your business requirements

  33. Multi-Site Clustering Introduction Networking Storage Quorum

  34. Quorum Overview • Disk only (not recommended) • Node and Disk majority • Node majority • Node and File Share majority • Majority is greater than 50% • Possible Voters: Nodes (1 each) + 1 Witness (Disk or File Share) • 4 Quorum Types Vote Vote Vote Vote Vote

  35. Replicated Disk Witness • The witness will decide which partition of nodes stays running when the nodes lose network connectivity • Witness disk should be a single decision maker • Do not use in multi-site clusters unless directed by vendor Vote Vote Vote Vote ? Replicated Storage

  36. Node Majority Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up 5 Node Cluster: Majority = 3 Site B Site A Cross site network connectivity broken! Majority in Primary Site

  37. Node Majority Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership We are down! 5 Node Cluster: Majority = 3 Site B Site A Need to force quorum manually Disaster at Site 1 Majority in Primary Site

  38. Forcing Quorum • Forcing quorum is a way to manually override and start a node even though it has not achieved quorum • Always understand why quorum was lost • Used to bring cluster online without quorum • Cluster starts in a special “forced” state • Once majority achieved, drops out of “forced” state • PowerShell (R2): • Start-ClusterNode –FixQuorum (or –fq) • Command Line: • net start clussvc /fixquorum (or /fq)

  39. Multi-Site with File Share Witness File Share Witness Site C (branch office) Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Share WAN Site A Site B

  40. Multi-Site with File Share Witness File Share Witness Can we communicate with majority of the voters in the cluster? Yes, including the lock with the FSW, so we stay up Site C (branch office) Can I communicate with majority of the nodes in the cluster? No lock on FSW, drop out of Cluster Membership \\Foo\Share WAN Complete resiliency and automatic recovery from the loss of connection between sites! Site A Site B

  41. File Share Witness (FSW) Considerations • Simple Windows File Server • Single file server can serve as a witness for multiple clusters • Each cluster requires it’s own share • FSW can be made highly available on a separate cluster • Recommended to be at 3rd separate site to enable automatic site failover • FSW cannot be on a node in the same cluster • FSW should not be in a VM running on the same cluster

  42. 2008 R2 Service Pack 1 Optimized to allow storage only visible to a subset of nodes Improves multi-site cluster experience Recent Changes • Asymmetrical Storage • Node Vote Weight Primary Secondary Post-SP1 Hotfix Granular control of which nodes have votes in determining quorum Flexibility for multi-site clusters

  43. Quorum Model Recap

  44. Session Summary • Multi-site Failover Clusters have many benefits • You can achieve HA and DR in a single solution • Multi-site clusters have additional considerations • Determine network topology across sites • Choose a replication solution • Plan quorum model & nodes

  45. Multi-Site Clustering Content • Design guide: • http://technet.microsoft.com/en-us/library/dd197430.aspx • Deployment guide/checklist: • http://technet.microsoft.com/en-us/library/dd197546.aspx

  46. Additional Information • Hyper-V Business Continuity portal • http://www.microsoft.com/virtualization/en/us/solution-continuity.aspx • Microsoft Cross-Site Disaster Recovery Solutions whitepaper • http://download.microsoft.com/download/3/6/1/36117F2E-499F-42D7-9ADD-A838E9E0C197/SiteRecoveryWhitepaper_final_120309.pdf

  47. Passion for High-Availability? Are You Up For a Challenge? Become a Cluster MVP! Contact: ClusMVP@Microsoft.com

  48. Stay up to date with TechNet Belux Register for our newsletters and stay up to date:http://www.technet-newsletters.be • Technical updates • Event announcements and registration • Top downloads Join us on Facebook http://www.facebook.com/technetbehttp://www.facebook.com/technetbelux LinkedIn: http://linkd.in/technetbelux/ Twitter: @technetbelux DownloadMSDN/TechNet Desktop Gadgethttp://bit.ly/msdntngadget

  49. TechDays 2011 On-Demand • Watchthis session on-demand via TechNet Edge http://technet.microsoft.com/fr-be/edge/http://technet.microsoft.com/nl-be/edge/ • Download to your favorite MP3 or video player • Get access to slides and recommended resources by the speakers

  50. THANK YOU

More Related