FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING

FAULT-TOLERANT NETWORKS ANDFAULT-TOLERANT ROUTING SONER DEDEOĞLU

SONER DEDEOĞLU • Introduction • Measures of Resilience • Graph-TheoreticalMeasures • Computer Networks Measures • Common Network Topologies and Their Resilience • Multi-stage and Extra-Stage Networks • Crossbar Networks • Regular Mesh and Interstitial Mesh • Hypercube Network • Fault Tolerant Routing • Hypercube Fault-Tolerant Routing • Origin-Based Routing in the Mesh Outline

SONER DEDEOĞLU introduction

SONER DEDEOĞLU • In shared-memory multiprocessor systems, connecting processors and memories • Processors read from or write into memories. • In distributed systems, connecting processors • Each has its own local memory • Communicate through messages while executing parts of a common application • Wide-area-networks, connecting large number of processors that operate independently • Information sharing through packets • Communication over switchboxes and routers Types of Interconnection Networks

SONER DEDEOĞLU Definition: Network organization One or more paths between source and the destination Uni- or Bi-directional links and switchboxes. If a single path exists between sender and receiver, one fault along the path will disconnect the communication terminals. Fault tolerance is achieved by having multiple paths and/or spare units. Network Topology

SONER DEDEOĞLU measures of resilience

SONER DEDEOĞLU • Node (link) connectivity • Minimum number of nodes (links) that must be removed to disconnect the graph • Distance between nodes • Smallest number of links • Diameter • Longest distance in the graph • Diameter stability • Rate of increase in diameter due to faulty nodes. • Persistence: Smallest number of nodes that must fail in order to increase the diameter of the graph Graph Theoretical Measures

SONER DEDEOĞLU • Reliability - R(t) • Probability that all nodes are operational and communicate during the time interval [0,t] • Path reliability: Same definition for a specific sender-receiver pair. • Bandwidth • Maximum rate of flow of messages • Connectability – Q(t) • Expected number of source-destination pairs still connected at time t in the presence of faults Computer Network Measures

SONER DEDEOĞLU common network topologies and their resilience

SONER DEDEOĞLU • TYPE-1 • Input and output nodes are connected through links and switchboxes. • Multi-stage networks • Crossbar • Resilience measures • Bandwidth • Connectivity • TYPE-2 • Nodes are not connected through switchboxes, but links. Nodes are both computational units and also serve as switches. • Mesh • Hypercube • Resilience measure • (Path) Reliability Types of topologies

SONER DEDEOĞLU • Butterfly network • Built out of 2x2 switches – Two input and two output ports • Straight • Cross • Upper broadcast • Lower broadcast Multi-stage Networks

SONER DEDEOĞLU Multi-stage Networks (cont’d) • k-stage butterfly network • 2k inputs and 2k outputs • 2k-1 switches in each stage • Connections follow a recursive pattern from input to output. • There is only one path from any given input to a specific output. (NOT FAULT TOLERANT) • If a fail occurs, the system still operates but in a degraded mode.

SONER DEDEOĞLU Extra-stage Networks • Duplication of stage-0 at the input of network • Bypass multiplexers around switchboxes at the input and output stages. A failed switch can be bypassed by routing around it. • FAULT TOLERANT up to one faulty switchbox anywhere in the system.

SONER DEDEOĞLU • Bandwidth • Expected number of access requests from the processors that reach the memories • Assumptions • Each processor generate in each cycle, with a probability pr, a request to a memory module, directed to any of the N memory modules with equal probability 1/N. • Requests in each cycle are independent from requests in previous cycles. Resilience Analysis of ButterflyBandwidth Calculation (without failures)

SONER DEDEOĞLU • pr(i) : probability that a link at stage i carries a request – calculated recuresively from processors (i=k-1) to memories (i=0). • Stage k-1 : Two processors feed each link: • Union of two independent events with the probability pr/2 for each. pr(k-1)= pr/2 + pr/2 - (pr/2)2 = pr – pr2/4 • Recursively for a link in stage i-1 (i = k-1, …, 1) pr(i-1) = pr(i) – (pr(i-1) )2/4 • and BW = Npr(0) Resilience Analysis of ButterflyBandwidth Calculation (without failures) (cont’d)

SONER DEDEOĞLU • ql: probability of link failure • pl – ql: probability of fault-free link • pl pr(i) / 2: probability that a request at the input line to stage I will propagate to an output in stage i • Resulting recursive equation is pr(i-1) = pl pr(i) – (pl pr(i))2/4 • and BW = Npr(0) Resilience Analysis of ButterflyBandwidth Calculation (including failures)

SONER DEDEOĞLU • Connectability • Expected number of connected sender-receiver pairs • Senders and receivers are fault-free. • Exactly one path exists between each pair. • Each path has k+1 links k switchboxes. • Probabilities of link and switchbox failures • ql and qs, respectively. • Probabilities of fault-free link and switchbox • pl = 1 – ql and ps = 1 – qs, respectively. • Probability of a fault-free path • plk+1 psk • There are 22k = N2 sender-receiver pairs. Q = 22k plk+1 psk = N2 plk+1 psk Resilience Analysis of ButterflyConnectability(including failures)

SONER DEDEOĞLU • Each processor-memory pair connected by two disjoint paths • Probability of at least one fault-free path = Prob{1st path is fault-free} + Prob{2nd path is fault-free} – Prob{both paths are fault-free} • This probability can assume one of the following two expressions A = (1-ql2) plk (1-ql2) + plk+2 – pl2 B = 2(1-ql2) plk+1 – pl2k+2 (1-ql2)2 • (1-ql2) is the probability that or a switchbox with a bypass multiplexer at least one link is operational • There are 22k = N2 sender-receiver pairs. Q = (A+B)22k/2 = (A+B)N2/2 Resilience Analysis of Extra-StageConnectability

SONER DEDEOĞLU Crossbar Networks • A switchbox for each sender-receiver pair. • NxM crossbar: • N-senders, M-receivers, NM-switches • The (i, j) switchbox connects row i input to column j output and is capable of • propagating a message along row from left link to right link • propagating a message along column from bottom to top link • turning a message from left link to top link • Failure of any switchbox will disconnect certain pairs (NOT FAULT TOLERANT)

SONER DEDEOĞLU Redundant Crossbars • A row and a column of switches are added. • Input and output connections are augmented. • Each input can be send either of two rows and can be received on either two columns. • If a switch becomes faulty, row and column to which it belongs are replace by the space row and column (FAULT TOLERANT)

SONER DEDEOĞLU Probability that a link is faulty: ql Probability that a link is fault-free: pl = 1 – ql Probability fo switchbox failures included in link failure probabilities. For input i to be connected to output j, we have to go through i+j links. Resilience Analysis of CrossbarConnectability

SONER DEDEOĞLU Mesh Networks • 2-Dimensional NxM rectangular mesh network • All nodes are computing • No separate switchboxes • Message sending • A path from source to destination identified and message forwarded along path • Conventional mesh reliability • Rmesh(t) = [R(t)]NM • R(t)  reliability of single node • NOT FAULT TOLERANT

SONER DEDEOĞLU Interstitial Mesh Networks • (1, 4) Interstitial redundancy • A spare node can be switched in to replace any neighbor failed. • Each primary node has a single spare node while each spare node is a spare node for primary nodes • Redundancy overhead = 25% • (4, 4) Interstitial redundancy • Primary node has four spare nodes. • Each spare node is a spare for four primary nodes. • Redundancy overhead = 100%

SONER DEDEOĞLU • (4,1) Interstitial Mesh • Mesh is size of NxM where N and M are even numbers • Cluster: four primary nodes with one spare • Mesh has NM/4 clusters at all. • R(t) : Reliability of a single primary or spare node • Reliability of cluster Rcluster(t)= R5(t) + 5R4(t)[1-R(t)] • Reliability of mesh Rmesh(t) = [Rcluster(t)]NM/4 • (4,4) Interstitial Mesh • No simple algorithm to calculate reliability of (4,4) interstitial mesh Resilience Analysis of Interstitial MeshReliability

SONER DEDEOĞLU Hn : An n-dimensional hypercube network including 2n nodes. A 0-dimensional hypercube H0 is a single node. Hn constructed by connecting the corresponding nodes of two Hn-1 networks. The edges added to connect the corresponding nodes are called dimension-(n-1) edges. Each node in an n-dimensional hypercube has n edges incident upon it. Hypercube Networks

SONER DEDEOĞLU Hypercube Networks (cont’d)

SONER DEDEOĞLU • Hn (n ≥ 2) can tolerate link failures by multiple path between source and destination. • Node failures can disrupt the operation. • Adding fault tolerance to overcome node failures • Adding one or more spare nodes • Increasing number of communication ports of each original node from n to n+1 • Connecting the extra ports through the additional links to spare node. • Using crossbar switches with outputs connected to spare node reduces number of ports of spare node to n+1. Fault Tolerance to Hypercubes

SONER DEDEOĞLU Fault Tolerance to Hypercubes (cont’d)

SONER DEDEOĞLU • Links and nodes all fail independently • Reliability of Hn is the product of • Reliability of 2n nodes, • Probability that every node can communicate with every other node • Exact evaluation of the probability is difficult due to multiple-path connection between source-destination pairs. • Lower bound on the reliability can be evaluated as addition of probabilities of three mutually exclusive cases for which the network is connected. Resilience Analysis of HypercubeReliability

SONER DEDEOĞLU • Hn is decomposed into two Hn-1 hypercubes, A and B, and the dimesion-(n-1) links connecting them. • CASE-1: • Both A and B are operational and at least one of dimension-(n-1) link is functional. • CASE-2: • One of A, B is operational and the other is not, and all dimension-(n-1) links are functional. • CASE-3: • Only one of A,B is operational, exactly one dimension-(n-1) link is faulty and is connected in the nonoperational Hn-1 to a node that has at least one functional link to another node. Resilience Analysis of HypercubeReliability (cont’d)

SONER DEDEOĞLU • Notations • qc : Probability of a node failure • ql : Probability of a link failure • NR(Hn, ql, qc) : Reliability of hypercube Hn • Assumption • Nodes are perfect reliable (qc = 0) • CASE-1: Prob{Case-1} = [NR(Hn-1, ql, 0)]2 (1 - ql2n-1) • CASE-2: Prob{Case-2} = 2NR(Hn-1, ql, 0) [1-NR(Hn-1, ql, 0)] (1 - ql)2n-1 • CASE-3: Prob{Case-3} = 2NR(Hn-1, ql, 0) [1-NR(Hn-1, ql, 0)] 2n-1ql(1-ql)2n-1-1 (1-qln-1) • NR(Hn, ql, 0) = Prob{Case-1} + Prob{Case-2} + Prob{Case-3} Resilience Analysis of HypercubeReliability (cont’d)

SONER DEDEOĞLU fault tolerant routing

SONER DEDEOĞLU • Objective: • Get a message from source to destination despite a subset of the network being faulty. • Basic Idea: • If no shortest or most convenient path is available because of failures, reroute message throught other paths to destination. • Unicast Routing: • A message is sent from a source to just a one destination • Multicast Routing: • Copies of a message sent to a number of nodes. Concepts

SONER DEDEOĞLU • Centralized routing • A central controller knows the network state – faulty links or nodes, congested links - and selects path for each message. • Distributed routing • Each intermediate node decides to which node to send it next. • Unique routing • One path for each source-destination pair • Adaptive routing • Path selected according to network conditions (congestion) Classification of Routing Algorithms

SONER DEDEOĞLU • Basic Idea: • List the dimensions along which thepacket must travel, and traverse them one by one. • As edges are traversed and are crossed off the list. • If, due to a link or node failure, the desired link is not available, another edge in the list, if any, is chosen for traversal • If packet arrives at some node to find all dimensions on its list down, it backtracks to the previous node and tries again. Hypercube Fault Tolerant Routing

SONER DEDEOĞLU • TD: list of dimensions that the message has traveled on - in order of traversal • TDR: list of dimensions in reversed order • ki=1 : Exculsive or operation carried out k times sequentialy. • D: destination • S: source • d = D S • SR(A): the set of relative addresses reachableby traversing each of the dimensions listed in A. • eni: n-bit vector consisting of a 1 in the ith bitposition and 0 everywhere else. • : append operation • TRANSMIT(j): Send packet (d ej, message payload, TD j) along the jth dimensional link from the present node Hypercube Fault Tolerant Routing (cont’d)

SONER DEDEOĞLU Hypercube Fault Tolerant Routing (cont’d)

SONER DEDEOĞLU Hypercube Fault Tolerant Routing (cont’d) • Case study: • We are given an H3 with faulty node 011. • S = 000 wants to send a message to D = 111. • At 000, d = 111, so it sends themessage out on dimension-0, to node 001 • At node 001, d = 110 and TD = (0). This node attempts to send it out on its dimension-1 edge. However, becausenode 011 is down, it cannot do so. • Since bit 2 of d is also 1, it checks and ﬁndsthat the dimension-2 edge to 101 is available. • The message is now sent to 101,from which it makes its way to 111.

SONER DEDEOĞLU Origin Based Routing in Mesh • Assumptions: • Two-dimensional NxN mesh with at most N-1 failures • All faulty regions are square, if not, additional nodes are declared to have pseudo faults • Each node knows the distance along each direction to the nearest faulty region in that direction • One node defined as the origin • Origin chosen so that its row and column do not have any faulty nodes

SONER DEDEOĞLU Origin Based Routing in Mesh (cont’d) • Sending a message from S to D • IN-path: Edges that take the message closer to the origin • OUT-path: takes the message farther away from the origin • Outbox: the smallest rectangular regionthat contains both the origin and the destination • Safe node: V is a safe node with respect to D and a set of faulty nodes F if • V is in the outbox for D • There exists a fault-free OUT-path from V to D • Diagonal band: Diagonal band for D - all nodes V in the outbox such thatxv – yv = xD – yD + e where e {-1,0,1} • Once we get to a safe node, there exists an OUT-path from that node to D

SONER DEDEOĞLU • Three Phase Algorithm • PHASE-1: • The message is routed on an IN path until it reaches the outbox, at node U. • PHASE-2: • Compute the distance from U to the nearest safe node and compare to the distance to the nearest faulty region in that direction. If the safe node is closer than the fault, route to the safe node; otherwise, continue to route on the IN links. • PHASE-3: • Once the message is at a safe node U, if there is a safe nonfaulty neighbor V that is closer to the destination, send it to V ; otherwise, U must be on the edge of a faulty region. In such a case, move the message along the edge of • the faulty region toward the destination D, and turn toward the diagonal band when it arrives at the corner of the faulty square. Origin Based Routing in Mesh (cont’d)

SONER DEDEOĞLU Origin Based Routing in Mesh (cont’d) • Case Study: • Routing a message from node S at northwest end of the network to D. • The message first moves along the INlinks, getting closer to the origin. • It enters the outbox at node A. • Since there is a failure directly east of A, it continues on the IN links untilit reaches the origin. • Then it continues, skirting the edgeof the faulty region until it reaches node B. • At this point, it recognizes the existence of a safe. • node immediately to the north and sends the message through this node to the destination.

SONER DEDEOĞLU references I. Koren and C.M. Krishna, Fault Tolerance Systems, Morgan Kaufmann, 2007. D. M. Blough and N. Bagherzadeh, “Near-Optimal Message Routing and Broadcasting in Faulty Hypercubes,” International Journal of Parallel Programming, Vol. 19, pp. 405–423, October 1990. R. Libeskind-Hadas and E. Brandt, “Origin-Based Fault-Tolerant Routing in the Mesh,” IEEE Symposium on High Performance Computer Architecture, pp. 102–111, 1995.

SONER DEDEOĞLU thanks

FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING

FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING

Presentation Transcript

Fault-Tolerant Broadcast

Fault Tolerant WSN Routing

Fault-Tolerant Broadcast

Fault-Tolerant CORBA

FAULT TOLERANT CORBA

Fault Tolerant MPI

Fault-Tolerant Consensus

Fault Tolerant Backplane

Tapestry Deployment and Fault-tolerant Routing

FAULT-TOLERANT COMPUTING

FAULT-TOLERANT COMPUTING

Fault Tolerant Configuration

Fault-tolerant Control

fault-tolerant

Tapestry Deployment and Fault-tolerant Routing

Fault-tolerant routing

Fault-Tolerant Consensus

Fault-Tolerant Broadcast

Fault-tolerant Computing