1 / 32

Linux-HA Release 2

Alan Robertson IBM Linux Technology Center alanr@unix.sh. Linux-HA Release 2. Linux-HA Release 2. What is High-Availability (HA) Clustering? What can HA do for me? What is the Linux-HA project? Linux-HA applications Linux-HA customers Linux-HA release 1 capabilities

yepa
Download Presentation

Linux-HA Release 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alan Robertson IBM Linux Technology Center alanr@unix.sh Linux-HA Release 2

  2. Linux-HA Release 2 • What is High-Availability (HA) Clustering? • What can HA do for me? • What is the Linux-HA project? • Linux-HA applications • Linux-HA customers • Linux-HA release 1 capabilities • Linux-HA release 2 capabilities • Comparative Architectures • Release 2 Details • Futures

  3. What Is HA Clustering? • Putting together a group of computers which trust each other to provide a service even when system components fail • When one machine goes down, others take over its work • This involves IP address takeover, service takeover, etc. • New work comes to the “takeover” machine • Not primarily designed for high-performance

  4. What Can HA Clustering Do For You? • It cannot achieve 100% availability– nothing can. • HA Clustering designed to recover from single faults • It can make your outages very short • From about a second to a few minutes • It is like a Magician's (Illusionist's) trick: • When it goes well, the hand is faster than the eye • When it goes not-so-well, it can be reasonably visible • A good HA clustering system adds a “9” to your base availability • 99->99.9, 99.9->99.99, 99.99->99.999, etc. • Complexity is the enemy of reliability!

  5. Single Points of Failure (SPOFs) • A single point of failure is a component whose failure will cause near-immediate failure of an entire system or service • Good HA design eliminates of single points of failure

  6. How Does HA work? Manage redundancy to improve service availability • Like a cluster-wide-super-init on steroids • Even complex services are now “respawn” • on node (computer) death • on “impairment” of nodes • on loss of connectivity • for services that aren't working (not necessarily stopped) • managing very complex dependency relationships

  7. Redundant Communications • Intra-cluster communication is critical to HA system operation • Most HA clustering systems provide mechanisms for redundant internal communication for heartbeats, etc. • External communications is usually essential to provision of service • External communication redundancy is usually accomplished through routing tricks • Having an expert in BGP or OSPF is a help

  8. Redundant Data Access • Replicated • Copies of data are kept updated on more than one computer in the cluster • Shared • Typically Fiber Channel Disk (SAN) • Sometimes shared SCSI • Back-end Storage (“Somebody Else's Problem”) • NFS, SMB • Back-end database

  9. The Desire for HA systems • Who wants low-availability systems? • Why are so few systems High-Availability?

  10. Why isn't everything HA? • Cost • Complexity

  11. The Linux-HA Project • Linux-HA is the oldest high-availability project for Linux, with the largest associated community • The core piece of Linux-HA is called “heartbeat”(though it does much more than heartbeat) • Linux-HA has been in production since 1999, and is currently in use on about ten thousand sites • Linux-HA also runs on FreeBSD and Solaris, and is being ported to OpenBSD and others • Linux-HA is shipped with every major Linux distribution except one.

  12. Linux-HA Release 1 Applications • Load Balancers • Web Servers • Database Servers • Custom Applications • Firewalls • Retail Point of Sale Solutions • Authentication • File Servers • Proxy Servers • Medical Imaging Almost any type server application you can think of – except SAP

  13. Linux-HA customers • Emageon – medical imaging services • Contraloria General de la Republica (Colombian government) • Incredimail bases their mail service on Linux-HA on IBM hardware • Karstadts' uses Linux-HA in each of several hundred stores • Bavarian Radio Station (Munich) coverage of 2002 Olympics in Salt Lake City • Circuit City, Autozone, others uses Linux-HA in each of several hundred stores • Citysavings Bank in Munich (infrastructure) • University of Toledo (US)– 20k student Computer Aided Instruction system • Autostrada – 230 clusters across country • The Weather Channel (weather.com) • Sony (manufacturing) • ISO New England manages power grid using 25 Linux-HA clusters

  14. Linux-HA Release 1 capabilities • Supports 2-node clusters • Can use serial, UDP bcast, mcast, ucast comm. • Fails over on node failure • Fails over on loss of IP connectivity • Capability for failing over on loss of SAN connectivity • Limited command line administrative tools to fail over, query current status, etc. • Active/Active or Active/Passive • Simple resource group dependency model • Requires external tool for resource monitoring • SNMP monitoring

  15. Linux-HA Release 2 capabilities • Built-in resource monitoring • Support for the OCF resource standard • Much Larger clusters supported (>= 8 nodes) • Sophisticated dependency model with rich constraint support (resources, groups, incarnations, master/slave) (needed for SAP) • XML-based resource configuration • Configuration and monitoring GUI • Support for GFS cluster filesystem • Multi-state (master/slave) resource support Initially - no IP, SAN monitoring

  16. Release 2 Credits • Andrew Beekhof – CRM, CIB • Gouchun Shi – significant infrastructure improvements • Sun, Jiang Dong and Huang, Zhen – LRM, Stonithd and testing • Lars Marowsky-Bree – architecture, PHB :-) • Alan Robertson – architecture, project leadership, original heartbeat code and testing

  17. Linux-HA Release 1 Architecture

  18. Linux-HA Release 2 Architecture(add TE and PE)

  19. Resource Objects in Release 2 • Release 2 supports “resource objects” which can be any of the following: • Primitive Resources • OCF, heartbeat-style, or LSB resource agent scripts • Resource Incarnations – need “n” resource objects - somewhere • Resource groups – a group of resources with implied co-location and linear ordering constraints • Multi-state resources (master/slave) • Designed to model master/slave (replication) resources (DRBD, et al)

  20. Basic Dependencies in Release 2 • Ordering Dependencies • start before (implies stop after) • start after (implies stop before) • Mandatory Co-location Dependencies • must be co-located with • cannot be co-located with

  21. Resource Location Constraints • Mandatory Constraints: • Resource Objects can be constrained to run on any selected subset of nodes. Default is none. • Preferential Constraints: • Resource Objects can also be preferentially constrained to run on specified nodes by providing weightings for arbitrary logical conditions • The resource object is run on the node which has the highest weight (score)

  22. Resource Incarnations • Resource Incarnations allow one to have a resource which runs multiple (“n”) times on the cluster • This is useful for managing • load balancing clusters where you want “n” of them to be slave servers • Cluster filesystems • Cluster Alias IP addresses

  23. Resource Groups Resource Groups provide a shorthand for making a creating ordering and co-location dependencies • Each resource object in the group is declared to have linear start-after ordering relationships • Each resource object in the group is declared to have co-location dependencies on each other • This is an easy way of converting release 1 resource groups to release 2

  24. Multi-State (master/slave) Resources • Normal resources can be in one of two stable states: • running • stopped • Multi-state resources can have more than two stable states. For example: • running-as-master • running-as-slave • stopped • This is ideal for modeling replication resources like DRBD

  25. Advanced Constraints • Nodes can have arbitrary attributes associated with them in name=value form • Attributes have types: int, string, version • Constraint expressions can use these attributes as well as node names, etc in largely arbitrary ways • Operators: • =, !=<, >, <=, >=, • defined(attrname), undefined(attrname), • colocated(resource id), notcolocated(resource id)

  26. Advanced Constraints (cont'd) • Each constraint is associated with particular resource, and is evaluated in the context of a particular node. • A given constraint has a boolean predicate associated with it according to the expressions before, and is associated with a weight, and a condition. • If the predicate is true, then the condition is used to compute the weight associated with locating the given resource on the given node. • Supported conditions are: (these distinctions may be unneeded ?) • can: same as prefer with MAXINT weight • cannot: same as prefer with -MAXINT weight • prefer: positive weight • prefer not: same as prefer with negative weight

  27. Security Considerations • Cluster: A computer whose backplane is the Internet • If this isn't frightening, you don't understand... • You may think you have a secure cluster network • You're probably mistaken now • You will be in the future

  28. Secure Networks are Difficult Because... • Security is not often well-understood by admins • Security is well-understood by “black hats” • Network security is easy to breach accidentally • Users bypass it • Hardware installers don't fully understand it • Most security breaches come from “trusted” staff • Staff turnover is often a big issue • Virus/Worm/P2P technologies will create new holes especially for Windows machines

  29. Security Advice • Good HA software should be designed to assume insecure networks • Not all HA software assumes insecure networks • Good HA installation architects use dedicated (secure?) networks for intra-cluster HA communication • Crossover cables are reasonably secure – all else is suspect ;-)

  30. References • http://linux-ha.org/ • http://linux-ha.org/download/ • http://wiki.linux-ha.org/NewHeartbeatDesign • New Web site content (in progress) • http://linux-ha.trick.ca/ (pretty - offline!) • http://wiki.linux-ha.org/ (editable) • www.linux-mag.com/2003-11/availability_01.html

  31. Legal Statements • IBM is a trademark of International Business Machines Corporation. • Linux is a registered trademark of Linus Torvalds. • Other company, product, and service names may be trademarks or service marks of others. • This work represents the views of the author and does not necessarily reflect the views of the IBM Corporation.

More Related