Exchange server 2010 high availability deep dive - PowerPoint PPT Presentation

slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Exchange server 2010 high availability deep dive PowerPoint Presentation
Download Presentation
Exchange server 2010 high availability deep dive

play fullscreen
1 / 58
Exchange server 2010 high availability deep dive
458 Views
Download Presentation
zubaida
Download Presentation

Exchange server 2010 high availability deep dive

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. SESSION CODE: EXL407 Scott Schnoll Principal Technical Writer Microsoft Corporation Exchange server 2010high availability deep dive (c) 2011 Microsoft. All rights reserved.

  2. Agenda • Exchange Server 2010 High Availability Deep Dive • Database Availability Group Networks • Active Manager • Best Copy Selection • Datacenter Activation Coordination Mode (c) 2011 Microsoft. All rights reserved.

  3. Exchange Server 2010 High Availability Deep Dive: Database Availability Group Networks

  4. DAG Networks • A DAG network is a collection of one or more subnets • There are two types of DAG networks • MAPI Network - connects DAG members to network resources (Active Directory, other Exchange servers, DNS, etc.) • Registered in DNS / DNS configured • Uses default gateway • Client for Microsoft Networks/File and Print Sharing enabled • Replication Network - used for/by continuous replication (log shipping and seeding) • Not registered in DNS / DNS not configured • Typically no default gateway • Client for Microsoft Networks/File and Print Sharing disabled

  5. DAG Networks • All DAGs must have: • Exactly one MAPI network • Zero or more Replication networks • Separate network(s) on separate subnet(s) • LRU determines which replication network is used with multiple replication networks • DAG networks automatically created when Mailbox server is added to DAG • Based on cluster’s enumeration of networks • Cluster enumeration based on subnet • One cluster network is created for each subnet

  6. DAG Networks • Maximum round trip return latency between all DAG members must be 500 ms or less • Regardless of the latency of the solution, customers should validate that the network between all DAG members is capable of satisfying the data protection and availability goals of the deployment • May need to investigate increasing the number of databases or decreasing the number of mailboxes per database to achieve desired goals

  7. DAG Networks

  8. DAG Networks

  9. DAG Networks • Collapse subnets into two DAG networks and disable replication for the MAPI network: Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04

  10. DAG Networks • Collapse subnets into two DAG networks and disable replication for the MAPI network: Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04

  11. DAG Networks • Automatic detection occurs only when members added to DAG • If networks are added after member is added, you must perform discovery Set-DatabaseAvailabilityGroup -DiscoverNetworks • DAG network configuration persisted in cluster registry • HKLM\Cluster\Exchange\DAG Network • DAG networks include built-in encryption and compression • Encryption: Kerberos SSP EncryptMessage/DecryptMessage APIs • Compression: Microsoft XPRESS, based on LZ77 algorithm

  12. DAG Networks • Block cross-network communication to minimize heartbeat traffic Allowed Subnet 1 Subnet 3 Subnet 2 Subnet 4 Blocked

  13. DAG Networks • If using iSCSI storage, configure DAG and cluster to ignore iSCSI networks • Set-DatabaseAvailabilityGroupNetwork -Identity <DAGNetworkName> -ReplicationEnabled:$false -IgnoreNetwork:$true • Cluster network <ClusterNetworkName> /prop Role=0

  14. DAG Networks • When a DAG spans multiple subnets you need an IP address on the MAPI network for each subnet • Use DHCP in site resilience configurations to assign IP addresses to Replication network • Enables delivery of the typically required static routes • If using static IP addresses, use netsh to configure static routes • Configure a DNS TTL on service access connection records that is consistent with your SLA, e.g. ~5 minutes for a one hour RTO SLA

  15. Exchange Server 2010 High Availability Deep Dive: Active Manager

  16. Active Manager • What are the three Active Manager roles? • Standalone • PAM (Primary Active Manager) • SAM (Standby Active Manager) • Transition of role state logged into Microsoft-Exchange-HighAvailability/Operational event log (Crimson Channel)

  17. Active Manager Functionality • Mount and Dismount Databases • Provide Database Availability Information • Provide Interface for Administrative Tasks • Monitor for Failures • Maintains Database and Server State Information

  18. AutoMount on DAG Members • In a DAG, all AutoMount operations are coordinated through the PAM • AutoMount operations occur: • When the first server in the DAG is initialized • When the ownership of the PAM role is changed

  19. AutoMount on DAG Members • Checks msExchMasterServerOrAvailabilityGroup to determine all databases hosted on the DAG • Checks if database can be mounted on startup • If msExchEDBOffline is TRUE, stop processing • If msExchEDBOffline is FALSE, proceed with processing

  20. AutoMount on DAG Members • Checks persistent database information stored in cluster registry • Determines if database is mounted on another DAG member • If the database is mounted on another server, take no action • If the database is not mounted on another server, proceed

  21. AutoMount on DAG Members • Checks AdminDismount in cluster registry: • If AdminDismount is TRUE, take no action • If AdminDismount is FALSE, proceed • Checks persistent database state information in cluster registry for server on which database was last mounted • If server available, issue mount request to Information Store on that server • If server not available or property not set, issue mount request to next server in sorted list

  22. AutoMount on DAG Members • If AutoMount operation succeeds: • Update persistent database state information stored in cluster database • Propagate information to all other DAG members

  23. Mount / Dismount Database Copy • Mount Database • An administrator action invoked through a task • The last part of a move operation • Dismount Database • An administrator action invoked through a task • The first part of a move operation

  24. Mount Database – DAG Member • Initiate RPC to member of the DAG • If the server contacted is not the PAM, the task is referred to the PAM • If the server is the PAM, continue with no referral • Checks the msExchMasterServerOrAvailabilityGroup to ensure database is hosted in the DAG • If database is hosted in DAG, proceed • If database is not hosted in DAG, error out

  25. Mount Database – DAG Member • Checks if the database is already mounted • If already mounted, task fails • If not already mounted, task continues • PAM invokes callback • This invokes a pre-check for the database mount operation • Persistent database state updated to show mount Initiated

  26. Mount Database – DAG Member • PAM invokes RPC call to Information Store to mount database • If mount fails, task fails • If mount succeeds, task completes successfully • Persistent database state updated to record results of operation and propagated to other members

  27. Dismount Database – DAG Member • Task initiates call to PAM or is referred to PAM • PAM checks that msExchMasterServerOrAvailabilityGroup value matches the DAG • PAM verifies that database is mounted in the DAG by checking persistent database state information stored in registry • If database is mounted, task proceeds • If database is dismounted, task fails

  28. Dismount Database – DAG Member • PAM updates persistent state information in cluster database to show state Initiated • PAM makes RPC call to Information Store on DAG member and invokes dismount • If dismount operation succeeds, persistent database state information stored in cluster database is updated • If dismount operation fails, task fails

  29. Auto Dismount – DAG Member • Occurs when a DAG loses quorum • All DAG members are running (but may not be participating in the cluster) • Databases dismounted as quickly as possible to avoid split-brain • Information Store service is terminated

  30. Auto Dismount – DAG Member • Dismount operation should attempt to update database state information in cluster database • This is the only case where a database operation occurs on a server other than the PAM

  31. Active Manager – Move Database • Move Database • An administrator action invoked by a task • Automatic operation initiated by the PAM (failover) • Begins with a Dismount operation and ends with a Mount operation

  32. Exchange Server 2010 High Availability Deep Dive: Best Copy Selection

  33. Best Copy Selection • Process of finding the best copy of an individual database to activate, given a list potential copies for activation and their status • Active Manager selects the “best” copy to become the new active copy when the existing active copy fails or when an administrator performs a targetless switchover

  34. Best Copy Selection – RTM • Sorts copies by copy queue length to minimize data loss, using activation preference as a secondary sorting key if necessary • Selects from sorted listed based on which set of criteria met by each copy • Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy

  35. Best Copy Selection – SP1 • Sorts copies by activation preference when auto database mount dial is set to Lossless • Otherwise, sorts copies based on copy queue length, with activation preference used a secondary sorting key if necessary • Selects from sorted listed based on which set of criteria met by each copy • Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy

  36. Best Copy Selection • Is database mountable? • Is copy queue length <= AutoDatabaseMountDial? • If Yes, database is marked as current active and mount request is issued • If not, next best database tried (if one is available) • During best copy selection, any servers that are unreachable or “activation blocked” are ignored

  37. Best Copy Selection

  38. Best Copy Selection – RTM • Four copies of DB1 • DB1 currently active on Server1 Server1 Server2 Server3 Server4 X DB1 DB1 DB1 DB1

  39. Best Copy Selection – RTM • Sort list of available copies based by Copy Queue Length (using Activation Preference as secondary sort key if necessary): • Server3\DB1 • Server2\DB1 • Server4\DB1

  40. Best Copy Selection – RTM • Only two copies meet first set of criteria for activation (CQL< 10; RQL< 50; CI=Healthy): • Server3\DB1 • Server2\DB1 • Server4\DB1 Lowest copy queue length – tried first

  41. Best Copy Selection – SP1 • Four copies of DB1 • DB1 currently active on Server1 • Auto database mountdial set to Lossless Server1 Server2 Server3 Server4 X DB1 DB1 DB1 DB1

  42. Best Copy Selection – SP1 • Sort list of available copies based by Activation Preference: • Server2\DB1 • Server3\DB1 • Server4\DB1

  43. Best Copy Selection – SP1 • Sort list of available copies based by Activation Preference: • Server2\DB1 • Server3\DB1 • Server4\DB1 Lowest preference value – tried first

  44. Best Copy Selection • After Active Manager determines the best copy to activate • The Replication service on the target server attempts to copy missing log files from the source (ACLL) • If successful, then the database will mount with zero data loss • If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial setting • If data loss is outside of dial setting, next copy will be tried

  45. Best Copy Selection • If an activated database copy is mounted • It will generate new log files (using the same log generation sequence) • Transport Dumpster requests will be initiated for the mounted database to recover lost messages • When original server or database recovers, it will run through divergence detection and either perform an incremental resync or require a full reseed

  46. Exchange Server 2010 High Availability Deep Dive: Datacenter Activation Coordination Mode

  47. Datacenter Activation Coordination Mode • DAC mode is a property of a DAG • Acts as an application-level form of quorum • Controls whether or not a Mailbox server attempts to mount its active databases on startup • Designed to prevent multiple copies of same database mounting on different members due to loss of network (split brain) • Also enables use of Site Resilience tasks • Stop-DatabaseAvailabilityGroup • Restore-DatabaseAvailabilityGroup • Start-DatabaseAvailabilityGroup

  48. Datacenter Activation Coordination Mode • RTM: DAC Mode for DAGs with three or more members that are extended to two Active Directory sites • Don’t enable for two-member DAGs where each member is in different AD site or DAGs where all members are in the same AD site • SP1: DAC Mode can be enabled for all DAGs • If using Third Party Replication (TPR) mode, check with your vendor for guidance on DAC mode

  49. Datacenter Activation Coordination Mode • Uses Datacenter Activation Coordination Protocol (DACP) • A bit in memory (in MSExchangeRepl.exe) set to either: • 0 = can’t mount • 1 = can mount