120 likes | 208 Views
Explore the challenges and benefits of High-Availability Clustering solutions in this empirical research presentation from DePaul University. Learn about fault tolerance, disaster recovery, and more. Gain insights into configuration difficulties and subjective observations on clustering technology. Conclusions based on empirical evidence offer valuable insights for system administrators.
E N D
An Empirical Examination of Current High-Availability Clustering Solutions’ Performance Jeffrey Absher DePaul University Research Symposium Presentation November 2003 See actual paper for bibliographical, procedural info, and appropriate academic reference information
HA and Related Technology • Distributed OS • Load Balancing • Disaster Recovery • Fault Tolerance • HA clustering
HA’s defining traits • SPOF avoided by using redundancy • Single image to the outside world using a single virtual IP address and hostname • Automated fault management and recovery • Multiple access paths from each cluster node to each resource group (set of HA services) • Simple abstraction for applications and administrators • Undisrupted (or minimal disrupted) services during failover. “If a computer breaks down, the functions performed by that computer will be handled by some other computer in the cluster.”
Subjective Observations • HA clustering is difficult to configure properly and the available documentation is lacking • Multiple machines must be configured simultaneously, often packages and software must be installed and configured in a specific order. • For what should be a loosely-coupled system, there are many interdependencies. • Youn et al suggest that the design of “administration of clusters…needs improvement,” – I agree • Vogels et al state, “Users find it difficult to configure clusters with the desired management … properties. It is difficult to configure applications to be automatically launched in an appropriate order. Lacking solutions to these problems, clusters will remain awkward and time-consuming tools.” - I agree
Objective ConclusionsBased on Empirical Evidence • HA is not a perfect solution for every environment, and may be a bad solution for some, depending on the expected faults. • High failover time for some systems contributes to a lower-than-expected performance of HA systems when compared to non-HA systems. • Failover times need to be significantly smaller than the time required for a reboot or even a restart of a slow-to-start process. • Primary-node negotiation time at boot contributes to poor performance during power outages. • There were cases where clustering is shown to actually decrease the uptime of a service or site.