1 / 5

Clustering Technology For Fault Tolerance

Clustering Technology For Fault Tolerance. Jim Gray Microsoft Research http://www.research.Microsoft.com/~Gray. What is Wolfpack?. A consortium of 60 HW & SW vendors (everybody who is anybody) A set of APIs for clustering and fault tolerance An enhancement to NT™ Server (in beta test )

Mercy
Download Presentation

Clustering Technology For Fault Tolerance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering TechnologyFor Fault Tolerance Jim Gray Microsoft Research http://www.research.Microsoft.com/~Gray

  2. What is Wolfpack? • A consortium of 60 HW & SW vendors(everybody who is anybody) • A set of APIs for clustering and fault tolerance • An enhancement to NT™ Server (in beta test ) • Key concepts • System: a particular node • Cluster: a collection of systems working together • resource: a hardware or software module • resource dependency: one resource needs another • resource group: fails over as a unit: dependencies do not cross group boundaries

  3. What Wolfpack Supports in V1 • two node failover (twin-tail SCSI) • Apps: • File, Print, web server, IP address, Net Name • Most of Microsoft BackOffice (SQL, Exchange, Viper, Falcon,…) • Oracle • SAP • many others • Easy to program, operate, use

  4. Cluster Advantages • Clients and Servers made from the same stuff. • Inexpensive: Built with commodity components • Fault tolerance: • Spare modules mask failures • Modular growth • grow by adding small modules • Parallel data search • use multiple processors and disks

  5. What Happens When a Component Fails? • Redundant disk or path: configure around it. • Non-redundant software: restart. • Non-redundant hardware: migrate software to surviving nodes. • Fault detection: 1 ms to 10 sec. • Failover .1 sec to 1 min. • This is standard in Tandem, Teradata, VMScluster

More Related