computing in the rain a reliable array of independent nodes n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Computing in the RAIN: A Reliable Array of Independent Nodes PowerPoint Presentation
Download Presentation
Computing in the RAIN: A Reliable Array of Independent Nodes

Loading in 2 Seconds...

play fullscreen
1 / 37

Computing in the RAIN: A Reliable Array of Independent Nodes - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

Computing in the RAIN: A Reliable Array of Independent Nodes. Group A3 Ka Hou Wong Jahanzeb Faizan Jonathan Sippel. Introduction. Presenter: Ka Hou Wong. Introduction. RAIN Research collaboration between Caltech and Jet Propulsion Laboratory Goal

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Computing in the RAIN: A Reliable Array of Independent Nodes' - andres


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
computing in the rain a reliable array of independent nodes

Computing in the RAIN: A Reliable Array of Independent Nodes

Group A3

Ka Hou Wong

Jahanzeb Faizan

Jonathan Sippel

introduction

Introduction

Presenter: Ka Hou Wong

introduction1
Introduction
  • RAIN
    • Research collaboration between Caltech and Jet Propulsion Laboratory
  • Goal
    • Identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components
hardware platform

C0

C1

C2

C3

C4

S0

S1

C = Computer

S = Switch

S2

S3

C5

C6

C7

C8

C9

Hardware Platform
  • Heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces through a network of switches
software platform

Application

MPI/PVM

RAIN

TCP/IP

Ethernet

Myrinet

ATM

Servernet

Network Connections

Software Platform
  • Collection of software modules that run in conjunction with operating system services and standard network protocols
key building blocks for distributed computer systems
Key Building Blocks For Distributed Computer Systems
  • Communication
    • Fault-tolerant communication topologies
    • Reliable communication protocols
  • Fault Management
    • Group membership techniques
  • Storage
    • Distributed data storage schemes based on error-control codes
features of rain
Features of RAIN
  • Communication
    • Provides fault tolerance in the network via the following mechanisms
      • Bundled interfaces
      • Link monitoring
      • Fault-tolerant interconnect topologies
features of rain cont d
Features of RAIN (cont’d)
  • Group membership
    • Identifies healthy nodes that are participating in the cluster
  • Data storage
    • Uses redundant storage schemes over multiple disks for fault tolerance
communication

Communication

Presenter: Jahanzeb Faizan

communication1
Communication
  • Fault-tolerant interconnect topologies
  • Network interfaces
fault tolerant interconnect technologies

C

C

S

S

S

C

C

S

S

C

C

How do you connect n nodes to a ring of n switches?

S

S

S

C

C

Fault-tolerant Interconnect Technologies
  • Goal
    • To connect computer nodes to a network of switches in order to maximize the network’s resistance to partitioning
na ve approach

C

C

S

S

S

C

C

S

S

1-fault-tolerant

C

C

The network is easily partitioned with two switch failures

S

S

S

C

C

Naïve Approach
  • Connect the computer nodes to the nearest switches in a regular fashion
diameter construction approach
Diameter Construction Approach
  • Connect computer nodes to the switching network in the most non-local way possible
    • Computer nodes are connected to maximally distant switches
    • Nodes of degree 2 connected between switches should form a diameter
diameter construction approach cont d

S0

S0

S7

S1

S1

S6

C3

C3

C2

C2

C4

C4

C1

C1

S6

S2

S5

S2

C5

C5

C0

C0

C6

C6

C7

S3

S5

S4

S3

S4

n = 8

n = 7

Diameter Construction Approach (cont’d)

Construction (Diameters). Let ds = 4 and dc = 2. i, 0 < i < n, label all compute nodes ci and switches si. Connect switch si to s(i+1)mod n, i.e., in a ring. Connect node ci to switches si and s(i+ n/2 +1)mod n.

Can tolerate 3 faults of any kind without partitioning the network

protocol for link failure
Protocol for Link Failure
  • Goal
    • Monitoring of available paths
  • Requirements
    • Correctness
    • Bounded Slack
    • Stability
correctness

Bi-directional Communication

B

A

If one side sees timeouts…

Both sides should mark the channel as being down

Correctness
  • Must correctly reflect the true state of the channel
bounded slack

A

B

A

B

Link History

U

U

U

U

Time

D

D

D

U

U

U

D

D

D

U

U

U

U = link up

D = link down

D

D

D

D

Bounded Slack
  • Ensure that both have a maximum slack of n transactions

Node A sees many more transactions than node B

Nodes A and B see tightly coupled views of the channel

stability
Stability
  • Each real channel event (i.e. time-out) should cause at most some bounded number state transactions at each endpoint
consistent history protocol for link failures
Consistent-History Protocol for Link Failures
  • Monitor available paths in the network for proper functioning
  • Modified Ping Protocol guarantees each side of communication channel sees the same history (bounded slack)
the protocol
The Protocol
  • Reliable Message Passing
  • Implementation:
    • Sliding window protocol
    • Existing reliable communication layer not needed
    • Reliable messaging built on top of ping messages
the protocol cont d

Sending and receiving of Ping messages using unreliable messaging

Sending and receiving of token using reliable messaging

Implemented by Pings or hardware feedback

Detect when link is up or down

Consistent history maintained

Tokens are sent on request

The Protocol (cont’d)

Protocol

demonstration

Start

T/1

Up

t = 2

Down

t = 2

Down

t = 1

tout/1

T/1

T/0

T/1

T/0

t: token count

T: token arrival event

tout: time-out event

Up

t = 1

Down

t = 0

tout/1

trigger event / token sent

Demonstration
group membership

Group Membership

Presenter: Jonathan Sippel

group membership1
Group Membership
  • Provides a level of agreement between non-faulty processes in a distributed application
  • Tolerates permanent and transient failures in both nodes and links
  • Based on two mechanisms
    • Token Mechanism
    • 911 Mechanism
token mechanism
Token Mechanism
  • Nodes in the membership are ordered in a logical ring
  • Token passed at a regular interval from one node to the next
  • Token carries the authoritative knowledge of the membership
  • Node updates its local membership information according to the received token
token mechanism cont d

D

A

D

A

B

C

B

C

Token Mechanism (cont’d)
  • Aggressive Failure Detection
token mechanism cont d1

D

A

D

A

B

C

B

C

Token Mechanism (cont’d)
  • Conservative Failure Detection
911 mechanism
911 Mechanism
  • When is the 911 Mechanism used?
    • Token Regeneration - Regenerate a token that is lost if a node or a link fails
    • Dynamic Scalability - Add a new node to the system
  • What is a 911 message?
    • Request for the right to regenerate the lost token
    • Must be approved by all the live nodes in the membership
token regeneration
Token Regeneration
  • Only one node is allowed to regenerate the token
  • Token sequence number is used to guarantee mutual exclusivity and is incremented every time the token is passed from one node to the next
  • Each node makes a local copy of the token on receipt
  • Sequence number on the node’s local copy of the token is added to the 911 message and compared to all the sequence numbers on the local copies of the token on the other live nodes
  • 911 request is denied by any node with a more recent copy of the token
dynamic scalability
Dynamic Scalability
  • 911 message sent by a new node to join the group
  • Receiving node
    • Treats the message as a join request because the originating node is not in the membership
    • Updates the membership the next time it receives the token and sends it to the new node
data storage
Data Storage
  • The RAIN system provides a distributed storage system based on a class of erasure-correcting codes called array codes that provide a mathematical means of representing data so lost information can be recovered
data storage cont d
Data Storage (cont’d)
  • Array codes
    • With an (n, k) erasure-correcting code, k symbols of original data are represented with n symbols of encoded data
    • With an m-erasure-correcting code, the original data can be recovered even if m symbols of the encoded data are lost
    • A code is said to be Maximum Distance Separable (MDS) if m = n – k
    • The only operations necessary to encode/decode an array code are simple binary XOR operations
data storage cont d1

a

b

c

d

e

f

A

B

C

D

D

F

B+D+e+f

C+E+f+a

D+F+a+b

E+A+b+c

F+B+c+d

A+C+d+e

Data Placement Scheme for a (6, 4) Array Code

Data Storage (cont’d)
data storage cont d2

?

?

c

d

e

f

?

?

C

D

D

F

?

?

D+F+a+b

E+A+b+c

F+B+c+d

A+C+d+e

Data Placement Scheme for a (6, 4) Array Code

Data Storage (cont’d)

A = C + d + e + (A + C + d + e)

b = A + (E + A + b + c) + c + E

a = b + (D + F + a + b) + D + F

B = a + c + (F + B + c + d) + d

data storage cont d3
Data Storage (cont’d)
  • Distributed store/retrieve operations
    • For a store operation a block of data of size d is encoded into n symbols, each of size d/k, using an (n, k) MDS array code
    • For a retrieve operation, symbols are collected from any k nodes and decoded
    • The original data can be recovered with up to n – k node failures
    • The encoding scheme provides for dynamic reconfigurability and load balancing
rain contributions to distributed computing systems
RAIN Contributions to Distributed Computing Systems
  • Fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures
  • Fault management techniques based on group membership
  • Data storage schemes based on computationally efficient error-control codes
references
References
  • Vasken Bohossian, Chenggong C. Fan, Paul S. LeMahieu, Marc D. Riedel, Lihao Xu, Jehoshua Bruck, “Computing in the RAIN: A Reliable Array of Independent Nodes,” IEEE Transactions On Parallel and Distributed Systems, Vol. 12, No. 2, February 2001
  • http://www.rainfinity.com/