Computer cluster
1 / 60

Computer Cluster - PowerPoint PPT Presentation

  • Updated On :

Computer Cluster. Course at the University of Applied Sciences - FH München Prof. Dr. Christian Vogt. Contents. TBD. Selected Literature. Gregory Pfister: In Search of Clusters, 2 nd ed ., Pearson 1998

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Computer Cluster' - candide

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Computer cluster l.jpg

Computer Cluster

Courseatthe University of Applied Sciences - FH München

Prof. Dr. Christian Vogt

Selected literature l.jpg
Selected Literature

  • Gregory Pfister: In Searchof Clusters, 2nded., Pearson 1998

  • Documentationforthe Windows Server 2008 Failover Cluster (on the Microsoft Web Pages)

  • Sven Ahnert: Virtuelle Maschinen mit VMware und Microsoft, 2. Aufl., Addison-Wesley 2007 (the 3rd edition is announced for June 26, 2009).

What is a cluster l.jpg
Whatis a Cluster?

  • Wikipediasays:A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer.

  • Gregory Pfister says:A clusteris a type of parallel ordistributedsystemthat:

    • consistsof a collectionofinterconnectedwholecomputers,

    • andisutilizedas a single, unifiedcomputingresource.

Features goals of clusters l.jpg
Features (Goals) of Clusters

  • High Performance Computing

  • LoadBalancing

  • High Availability

  • Scalability

  • Simplified System Management

  • Single System Image

Basic types of clusters l.jpg
Basic Typesof Clusters

  • High Performance Computing (HPC) Clusters

  • LoadBalancing Clusters (aka Server Farms)

  • High-Availability Clusters (aka Failover Clusters)

Load balancing clusters l.jpg
LoadBalancing Clusters

Microsoft network load balancing 1 l.jpg
Microsoft Network LoadBalancing (1)

Microsoft network load balancing 2 l.jpg
Microsoft Network LoadBalancing (2)

Shared everything cluster l.jpg
SharedEverything Cluster

Shared nothing cluster l.jpg
SharedNothing Cluster

High availability cluster 1 l.jpg
High Availability Cluster (1)

High availability cluster 2 l.jpg
High Availability Cluster (2)

Selected ha cluster products 1 l.jpg
Selected HA Cluster Products (1)

  • VMScluster (DEC 1984, today: HP)

    • Sharedeverythingclusterwithupto 96 nodes.

  • IBM HACMP (High Availability Cluster Multiprocessing, 1991)

    • Upto 32 nodes (IBM System p with AIX or Linux).

  • IBM Parallel Sysplex (1994)

    • Sharedeverything, upto 32 nodes (mainframeswith z/OS).

  • Solaris Cluster, aka Sun Cluster

    • Upto 16 nodes.

Selected ha cluster products 2 l.jpg
Selected HA Cluster Products (2)

  • Heartbeat (HA Linux project, started in 1997)

    • Noarchitecturallimitforthenumberofnodes.

  • Red Hat Cluster Suite

    • Upto 128 nodes. DLM

  • Windows Server 2008 Failover Cluster

    • Was: Microsoft Cluster Server (MSCS, since 1997).

    • Upto 16 nodes on x64 (8 nodes on x86).

  • Oracle Real Application Cluster (RAC)

    • Two or more computers, each running an instance of the Oracle Database, concurrently access a single database.

    • Up to 100 nodes.

Active standby cluster l.jpg
Active-Standby Cluster

Active active cluster l.jpg
Active-Active Cluster

Cluster with virtual machines 1 l.jpg
Cluster with Virtual Machines (1)

  • Onephysicalmachineashotstandbyforseveralphysicalmachines:


Cluster with virtual machines 2 l.jpg
Cluster with Virtual Machines (2)

  • Consolidationofseveralclusters:


Cluster with virtual machines 3 l.jpg
Cluster with Virtual Machines (3)

  • Clustering hosts (failingoverwhole VMs):


Iscsi l.jpg

  • Internet Small Computer Systems Interface

    • is a storageareanetwork (SAN) protocol,

    • carries SCSI commandsover IP networks (LAN, WAN, Internet),

    • is an alternative toFibre Channel (FC), using an existingnetworkinfrastructure.

  • An iSCSIclientiscalled an iSCSI Initiator.

  • An iSCSIserveriscalled an iSCSI Target

Iscsi initiator l.jpg
iSCSI Initiator

  • An iSCSIinitiatorinitiates a SCSI session, i.e. sends a SCSI commandtothetarget.

  • A Hardware Initiator (hostbusadapter, HBA)

    • handlestheiSCSIand TCP processingand Ethernet interruptsindependentlyofthe CPU.

  • A Software Initiator

    • runsas a memory resident devicedriver,

    • uses an existingnetworkcard,

    • leaves all protocolhandlingtothemain CPU.

Iscsi target l.jpg
iSCSI Target

  • An iSCSItarget

    • waitsforiSCSIinitiators‘ commands,

    • providesrequiredinput/outputdatatransfers.

  • Hardware Target:A storagearray (SAN) mayofferitsdisks via theiSCSIprotocol.

  • A Software Target:

    • offers (partsof) thelocaldiskstoiSCSIinitiators,

    • uses an existingnetworkcard,

    • leaves all protocolhandlingtothemain CPU.

Logical unit number lun l.jpg
Logical Unit Number (LUN)

  • A Logical Unit Number (LUN)

    • istheunitofferedbyiSCSItargetstoiSCSIinitiators,

    • represents an individuallyaddressable SCSI device,

    • appearsto an initiatorlike a locallyattacheddevice,

    • mayphysicallyreside on a non-SCSI device, and/orbepartof a RAID set,

    • mayrestrictaccessto a singleinitiator,

    • maybesharedbetweenseveralinitiators (leavingthehandlingofaccessconflictstothefile resp. operatingsystem, ortosomeclustersoftware).Attention: manyiSCSItargetsolutions do not offerthisfunctionality.

Chap protocol l.jpg
CHAP Protocol

  • iSCSI

    • optionallyusesthe Challenge-Hand-shakeAuthen-tication Protocol (CHAP) forauthenticationofinitiatorstothetarget,

    • does not providecryptographicprotectionforthedatatransferred.

  • CHAP

    • uses a three-wayhandshake,

    • basestheverification on a sharedsecret, whichmust beknowntoboththeinitiatorandthetarget.

Preparing a failover cluster l.jpg
Preparing a Failover Cluster

  • In order tobuild a Windows Server 2008 Failover Cluster youneedto:

    • Installthe Failover Cluster Feature (in Server Manager).

    • Conncectnetworksandstorage.

      • Public network

      • Heartbeatnetwork

      • Storage network (FC oriSCSI, unlessyouuse SAS)

    • Validatethehardwareconfiguration (Cluster Vali-dation Wizard in the Failover Cluster Management snap-in).

Preparing the shared storage l.jpg
PreparingtheShared Storage

  • All disks on a shared storage bus are automatically placed in an offline state when first mapped to a cluster node. This allows storage to be simultaneously mapped to all nodes in a cluster even before the cluster is created. No longer do nodes have to be booted one at a time, disks prepared on one and then the node shut down, another node booted, the disk configuration verified, and so on.

The cluster validation wizard l.jpg
The Cluster Validation Wizard

  • Run the Cluster Validation Wizard (in Failover Cluster Management).

    • Adjustyourconfigurationuntilthewizarddoes not reportanyerrors.

    • An error-freeclustervalidationis a prerequisiteforobtaining Microsoft supportforyourclusterinstallation.

  • A fulltestofthe Wizard consistsof:

    • System configuration

    • Inventory

    • Network

    • Storage

Initial creation of a windows server 2008 failover cluster l.jpg
Initial Creationof a Windows Server 2008 Failover Cluster

  • Usethe Create Cluster Wizard (in Failover Cluster Management) tocreatethecluster. You will havetospecify

    • whichserversaretobepartofthecluster,

    • a nameforthecluster,

    • an IP addressforthecluster.

  • Other parameters will bechosenautomatically, andcanbechangedlater.

Fencing l.jpg

  • (Node) Fencing is the act of forcefully disabling a cluster node (or at least keeping it from doing disk I/O: Disk Fencing).

  • The decisionwhen a nodeneedstobefencedistakenbytheclustersoftware.

  • Somewaysofhow a nodecanbefencedare

    • bydisablingitsport(s) on a Fibre Channel switch,

    • by (remotely) powering down thenode,

    • byusingthe SCSI-3 Persistent Reservation.

San fabric fencing l.jpg
SAN FabricFencing

  • SomeFibre Channel Switches allowprogramstofence a nodebydisablingtheswitchport(s) thatitisconnectedto.

Stonith l.jpg

  • “Shoot the other node in the head”.

  • A special STONITH device (a Network Power Switch) allows a clusternodeto power down otherclusternodes.

  • Used, forexample, in Heartbeat, the Linux HA project.

Scsi 3 persistent reservation l.jpg
SCSI-3 Persistent Reservation

  • Allows multiple nodestoaccess a SCSI device.

  • Blocks othernodesfromaccessingthedevice.

  • Supports multiple pathsfromhosttodisk.

  • Reservationsare persistent across SCSI busresets, andnodereboots.

  • Usesreservations, andregistration.

  • Toejectanothersystem‘sregistration, a nodeissues a pre-emptandabortcommand.

Fencing in failover cluster l.jpg
Fencing in Failover Cluster

  • Windows Server 2008 Failover Cluster uses SCSI-3 Persistent Reservations.

  • All sharedstoragesolutions (e.g. iSCSI Targets) used in thecluster must use SCSI-3 commands, and in particularsupport persistent reserva-tions.

    (Many open sourceiSCSItargets do not fulfillthisrequirement, e.g. OpenFiler, orFreeNAStarget.)

A cluster validation error l.jpg
A Cluster Validation Error

  • The Cluster Validation Wizard mayreportthefollowingerror:

Cluster partitioning split brain l.jpg
Cluster Partitioning (Split-Brain)

  • Cluster Partitioning (Split-Brain) isthe situ-ationwhenthe cluster nodes break up into groups which can communicate in their groups, and with the shared storage, but not between groups.

  • Cluster Partitioningcanleadtoseriousproblems, includingdatacorruption on theshareddisks.

Quorum schemes l.jpg
Quorum Schemes

  • Cluster Partitioningcanbeavoidedbyusinga Quorum Scheme:

    • A groupofnodesisonlyallowedtorunas a clusterwhenithasquorum.

    • Quorumconsistsof a majorityofvotes.

    • Votescanbecontributedby

      • Nodes

      • Disks

      • File Shares


Votes in failover cluster l.jpg
Votes in Failover Cluster

  • In Windows Server 2008 Failover Cluster votescanbecontributedby

    • a node,

    • a disk (calledthewitnessdisk),

    • a fileshare,


  • A Witness Disk or File Share contains the cluster registry hive in the \Cluster directory.(The same information is also stored on each of the cluster nodes but may be out of date).

Quorum schemes in windows server 2008 failover cluster 1 l.jpg
Quorum Schemes in Windows Server 2008 Failover Cluster (1)

Windows Server 2008 Failover Cluster canuseanyoffour different Quorum Schemes:

  • NodeMajority

    • Recommended for a clusterwith an oddnumberofnodes.

  • Nodeand Disk Majority

    • Recommended for a clusterwith an evennumberofnodes.

Quorum schemes in windows server 2008 failover cluster 2 l.jpg
Quorum Schemes in Windows Server 2008 Failover Cluster (2)

  • Nodeand File Share Majority

    • Recommended for a multi-siteclusterwith an evennumberofnodes.

  • NoMajority: Disk Only

    • A groupofnodesmayrunas a clusteriftheyhaveaccesstothewitnessdisk.

    • The witnessdiskis a singlepointoffailure.

    • Not recommended. (Onlyforbackwardcompatibilitywith Windows Server 2003.)

Failover cluster terminology l.jpg
Failover Cluster Terminology

  • Resources

  • Groups

  • Services andApplications

  • Dependencies

  • Failover

  • Failback

  • Looks-Alive („Basic resourcehealth check“, defaultinterval: 5 sec.)

  • Is-Alive („Thoroughresourcehealth check“, defaultinterval: 1 min.)

Services and applications l.jpg
Services andApplications

  • DFS Namespace Server

  • DHCP Server

  • Distributed Transaction Coordinator (DTC)

  • File Server

  • Generic Application

  • Generic Script

  • Generic Service

  • Internet Storage Name Service (ISNS) Server

  • Message Queuing

  • Other Server

  • Print Server

  • Virtual Machine (Hyper-V)

  • WINS Server

Properties of services and applications l.jpg
Properties of Services andApplications

  • General:

    • Name

    • PreferredOwner(s)(Muss angegeben werden, wenn ein Failback gewünscht ist.)

  • Failover:

    • Period (Default: 6 hours)Numberofhours in whichthe Failover Threshold must not beexceeded.

    • Threshold (Default: 2 [?, 2 for File Server])Maximum numberoftimestoattempt a restartorfailover in thespecifiedperiod. Whenthisnumberisexceeded, theapplicationisleft in thefailedstate.

  • Failback:

    • Preventfailback (Default)

    • Allowfailback

      • Immediately

      • Failbackbetween (specifyrangeofhoursoftheday)

Resource types l.jpg

In additionto all servicesandapplicationsmentionedbefore:

  • File Share Quorum Witness

  • IP Address

  • IPv6 Address

  • IPv6 Tunnel Address

  • MSMQ Triggers

  • Network Name

  • NFS Share

  • Physical Disk

  • Volume ShadowCopy Service Task

Properties of resources 1 l.jpg
Properties of Resources (1)

  • General:

    • Resource Name

    • Resource Type

  • Dependencies

  • Policies:

    • Do not restart

    • Restart (Default)

      • Threshold: Maximum numberofrestarts in theperiod. Default: 1

      • Period: Periodforrestarts. Default: 15 min.

      • Failover all resources in theservice/applicationifrestartfails? Default: yes

      • Ifrestartfails, beginrestartingagain after ... Default: 1 hour

    • Pending Timeout. Default: 3 minutes

Properties of resources 2 l.jpg
Properties of Resources (2)

  • AdvancedPolicies:

    • PossibleOwners.

    • Basic resourcehealth check interval / Thoroughresourcehealth check interval

      • Default: Usestandard time periodfortheresource type

      • Usespecified time period (defaults: 5 sec. / 1 min.)

    • Run resource in separate Resource Monitor. Default: no.

  • Further parametersdepending on the type oftheresource.

New cluster architecture l.jpg











New Cluster Architecture





ClusRes.dllDisk Resource




Control path


Major change is that ClusDisk no longer is in the disk fencing business

MS MPIO Filter



Storage enclosure

Cluster architecture w2k l.jpg
Cluster Architecture (W2K)

Cluster service components 1 l.jpg
Cluster Service Components (1)

  • Database Manager

    • Managestheconfigurationdatabasecontained in theregistryofeachclusternode.

    • Coordinatesupdatesofthedatabase.

    • Makessurethatupdatesareatomicacrosstheclusternodes..

  • Node Manager (or: Membership Manager)

    • Maintainsclustermembership.

    • The nodemanagersof all clustermanagerscommunicate in order todeterminethefailureof a node.

  • Event Processor

    • Is responsibleforcommunicatingeventstotheapplications, andtoothercomponentsoftheclusterservice.

Cluster service components 2 l.jpg
Cluster Service Components (2)

  • Communication Manager

    • Is responsibleforthecommunicationbetweentheclusterservices on theclusternodes, e.g. relatedto

      • negotiatingtheentranceof a nodeintothecluster,

      • informationaboutresourcestates,

      • failoverandfailbackoperations.

  • Global Update Manager

    • Componentfordistributing update requeststo all clusternodes.

  • Resource/Failover Manager: isresponsiblefor

    • managingthedepenciesbetweenresources,

    • startingandstoppingresources,

    • initializingfailoverandfailback.

Resource monitors l.jpg
Resource Monitors

  • Resource Monitors handle thecommunicationbetweentheclusterserviceandresources.

  • A Resource Monitor is a separate process, usingresourcespecific DLLs.

  • A Resource Monitor usesone „pollerthread“ per 16 resourcesforperformingtheLooksAliveandIsAlivetests.

Routines in a resource dll l.jpg
Routines in a Resource DLL

  • The resource API forwritingownresource DLLs knowstwotypesoffunctions:

    • Callback routines, whichcanbecalledfromthe DLL:

      • LogEvent

      • SetResourceStatus

    • Entry-pointroutines, whicharecalledbytheresourcemonitor:

      • Startup (calledonceforeveryresource type)

      • Open (executedwhencreating a newresource)

      • Online (limit: 300 msorasynch. in workerthread)

      • LooksAlive (limit: 300 ms, recommended: < 50 ms)

      • IsAlive (limit: 400 ms, recomm.: < 100 ms, orasynch.)

      • Offline (limit: 300 ms, orasynch. in workerthread)

      • Terminate (on error in offline orpending-timeout)

      • Close (executedwhendeleting a resource)

      • ResourceControl, andResourceTypeControl (for„private properties“)

Status control for resources l.jpg
Status Controlfor Resources