1 / 11

2-Node Clustering

2-Node Clustering. Active-Standby Deployment. 2-Node Deployment Topology. Active-Standby Requirements. Requirements Configuration of Primary controller in cluster (Must) Primary Controller services the NorthBound IP address, a Secondary takes over NB IP upon failover (Must)

amber-drake
Download Presentation

2-Node Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2-Node Clustering Active-Standby Deployment

  2. 2-Node Deployment Topology Active-Standby Requirements Requirements • Configuration of Primary controller in cluster (Must) • Primary Controller services the NorthBound IP address, a Secondary takes over NB IP upon failover (Must) • Configuration of whether on failover & recovery, configured Primary controller reasserts leadership (Must) • Configuration of merge strategy on failover & recovery (Want) • Primary controller is master of all devices and is leader of all shards (Must) • Single node operation allowed (access to datastore on non-quorum) (Want)

  3. Failure of Primary Scenario 1: Master Stays Offline Failover Sequence • Secondary controller becomes master of all devices and leader of all shards

  4. Failure of Primary Scenario 2: Primary Comes Back Online Recovery Sequence • Controller A comes back online and its data is replaced by all of Controller B’s data • For Re-assert leadership configuration: • (ON) Controller A becomes master of all devices and leader of all shards • (OFF) Controller B stays master of all devices and maintains leadership of all shards

  5. Network Partition Scenario 1: During Network Partition Failover Sequence • Controller A becomes master of devices in its network segment and leader of all shards • Controller B becomes master of devices in its network segment and leader of all shards

  6. Network Partition Scenario 2: Network Partition Recovers Recovery Sequence • Merge data according to pluggable merge strategy (Default: Secondary’s data replaced with Primary’s data.) • For Re-assert leadership configuration: • (ON) Controller A becomes master of all devices and leader of all shards again. • (OFF) Controller B becomes master of all devices and leader of all shards again

  7. No-Op Failures Failures That Do Not Result in Any Role Changes Scenarios • Secondary controller failure. • Any single link failure. • Secondary controller loses network connectivity (but device connections to Primary maintained)

  8. Cluster Configuration Options Global & Granular Configuration Global • Cluster Leader(aka “Primary”) • Allow this to be changed on live system, e.g. maintenance. • Assigned (2-Node Case), Elected (Larger Cluster Case) • Cluster Leader Northbound IP • Reassert Leadership on Failover and Recovery • Network Partition Detection Alg. (pluggable) • Global Overrides of Per Device/Group and Per Shard items (below) Per Device / Group • Master / Slave Per Shard • Shard Leader (Shard Placement Strategy – pluggable) • Shard Data Merge (Shard Merge Strategy – pluggable)

  9. HA Deployment Scenarios Simplified Global HA Settings Can we Abstract Configurations to Admin-Defined Deployment Scenarios? • e.g. Admin Configures 2-Node (Active-Standby): • This means Primary controller is master of all devices and leader of all shards. • Conflicting configurations are overridden by deployment scenario.

  10. Implementation Dependencies Potential Changes to Other ODL Projects Clustering: • Refactoring of Raft Actor vs. 2-Node Raft Actor code. • Define Cluster Leader • Define Northbound Cluster Leader IP Alias OpenFlow Plugin: • OpenFlow Master/Slave Roles • Grouping of Master/Slave Roles (aka “Regions”) System: • Be Able to SUSPEND the Secondary controller to support Standby mode.

  11. Open Issues Follow-up Design Discussion Topics TBD: • Is Master/Slave definition too tied to OpenFlow? (Generalize?) • Should device ownership/mastership be implemented by OF Plugin? • How to define Northbound Cluster Leader IP in a platform independent way?(Linux/Mac OSx: IP Alias, Windows: Possible) • Gratuitous ARP on Leader Change. • When both Controllers are active in Network Partition scenario which controller “owns” the Northbound Cluster Leader IP? • Define Controller-Wide SUSPEND behavior (how?) • On failure Primary controller should be elected (2-node case Secondary is only option to be elected) • How/Need to detect management plane failure? (Heartbeat timeout >> w.c. GC?)

More Related