Adaptable virtual machine environment for heterogeneous clusters
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Adaptable Virtual Machine Environment for Heterogeneous Clusters PowerPoint PPT Presentation


  • 54 Views
  • Uploaded on
  • Presentation posted in: General

Adaptable Virtual Machine Environment for Heterogeneous Clusters. Al Geist, Jim Kohl, Stephen Scott, Philip Papadopoulos, Oak Ridge National Laboratory Jack Dongarra, Graham Fagg University of Tennessee Vaidy Sunderam, Paul Gray, Mauro Magliardi Emory University. September 2-4

Download Presentation

Adaptable Virtual Machine Environment for Heterogeneous Clusters

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Adaptable virtual machine environment for heterogeneous clusters

Adaptable Virtual Machine Environment for Heterogeneous Clusters

Al Geist, Jim Kohl, Stephen Scott, Philip Papadopoulos,

Oak Ridge National Laboratory

Jack Dongarra, Graham Fagg

University of Tennessee

Vaidy Sunderam, Paul Gray, Mauro Magliardi

Emory University

September 2-4

Blackberry Farms TN


Harness plug in machine research

Harness Plug-in Machine Research

Building on our experience and success with PVM

create a fundamentally new heterogeneous virtual machine

based on three research concepts:

  • Parallel Plug-in environment

    • Extend the concept of a plug-in to the parallel computing world.

  • Distributed peer-to-peer control

    • No single point of failure unlike typical client/server models.

  • Multiple distributed virtual machines merge/split

    • Provide a means for short-term sharing of resources and collaboration between teams.

www.epm.ornl.gov/harness


Motivated by needs from simulation science

Motivated by needs from Simulation Science

  • develop applications by plugging together

  • component models.

  • customize/tune virtual environment for application’s needs and for performance on existing resources.

  • support long-running simulations despite maintenance, faults, and migration (dynamically evolving VM).

  • adapt virtual machine to faults and dynamic

  • scheduling in large clusters (DASE).

  • Provide framework for collaborative simulations (in spirit of Cumulvs).


Harness architecture extends successful pvm design

Discovery and registration

Resource Catalog

Directory Service

Another

VM

Component

based daemon

Resource mgt

process control

Customization

and extension

by dynamically

adding plug-ins

communication

user features

HARNESS daemon

Harness Architecture (extends successful PVM design)

Host A

Host D

Virtual

Machine

Host B

Host C

Operation within VM uses

Distributed Control


Harness research parallel plug in environment

?

Harness ResearchParallel Plug-in Environment

  • serial plug-in technology

  • User definable Control Messages

  • Netscape plug-in model

  • JavaBeans

  • CORBA IDL

  • ActiveX/DCOM model

How do you write

plug-ins for a heterogeneous

distributed virtual machine?

  • One research goal is to understand and implement

  • a parallel plug-in environment within Harness

  • provides a method for many users to extend Harness (like LINUX)

  • taxonomy based on synchronization needs (three typical cases):

    • load plug-in into single host of VM w/o communication

    • load plug-in into single host broadcast to rest of VM

    • load plug-in into every host of VM w/ synchronization


Daemon plug in interface based on re definable message handlers

Daemon plug-in interface based on re-definable message handlers

Source,tag,context

gPort concept from

Common Component

Architecture forum

Required

VM control

messages

Process Spawn

plug-in

Incoming mesg.

PVM notify

plug-in

Data or

Control messages

User defined

handlers

New user feature

plug-in

Daemon services

are triggered by

control messages

User can define new

control messages and

exchange handlers to required control messages

MPI send

plug-in

Harness Daemon


Harness research multiple virtual machine collaboration

Harness Research Multiple Virtual Machine Collaboration

1. Send messages between VM

Distributed Virtual Machine

Sharing information but not resources

Distributed Virtual Machine

2. Merge into single asymmetric VM

Sharing resources unequally

f.e. user can only use I/O resources

he contributes, but all CPUs

Each user sees a single but

different VM

Sharing resources equally

among all users

2. Merge into single symmetric VM


Harness research distributed control features

Harness ResearchDistributed Control Features

  • No synchronization step. Updates occur asynchronously while maintaining consistency.

  • All members can be injecting change requests at the same time.

  • Members can be added or deleted fast because the operation does not require a resynchronization on pending changes.

  • Failure of host does not negate any partially committed changes. i.e. no rollback required.


Symmetric peer to peer distributed control

Symmetric peer-to-peer Distributed Control

  • No single point (or set of points) of failure for Harness. It survives as long as one member still lives.

  • All members know the state of the virtual machine, and their knowledge is kept consistent w.r.t. the order of changes of state. (Important parallel programming requirement!)

  • No member is more important than any other (at any instant) i.e. here isn’t a pass-around “control token”

One of two schemes being investigated for Harness follows


Phase one of arbitration update pending list

Phase one of Arbitration:Update Pending List

Virtual machine

3.

Each adds request to a

list of pending changes

2.

Send host/T#/data

to neighbor in ring

1.

A task on this host

requests a new host be added

VM state

held by

each kernel

Harness kernels on each host

have arbitrary priority assigned to them

(new kernels are always given the lowest priority)


Phase two of arbitration update distributed state

Phase two of Arbitration:Update Distributed State

Virtual machine

Originating kernel receives

its own initial request

1.

2.

Creates unique transaction

number and sends (host/T#/trans#) to neighbor

Each kernel receives second request

then moves pending data to state Db

and forwards request

3.


Details of pending list structure

Details of Pending List Structure

Pending - list of pending transactions

Forwarded

transaction

Hold - list of incoming pending transactions

that are being held because this

host has higher priority transactions

still pending.

Mine - list of transactions that this host

has injected into the ring and

not yet received the phase one

reply.

Incoming

(commit or pending)

transaction

Inject - list of transactions local tasks

have requested but can’t be

injected until the pending

transactions in Mine are done.


Multiple asynchronous updates

Multiple AsynchronousUpdates

Virtual machine

Each kernel can be

injecting requests into

ring at the same time

Each holds start of

second request phase

until pending higher

priority requests committed

State Db are thus maintained

consistently ordered across

the entire VM

Change requests


Adding new host without clearing existing pending lists

Adding New Host withoutClearing Existing Pending Lists

Virtual machine

Two phase commit on

“add host” so all state Db updated

New host is assigned the

lowest priority so changes

it begins injecting doesn’t

affect any pending changes

Requesting host sends new

host a copy of its state Db and

pending list, then adjusts

links to add host to the ring.

New host


Deleting host without clearing existing pending lists

Deleting Host withoutClearing Existing Pending Lists

Virtual machine

Two phase commit done on

“delete host” so state Dbs are

updated

Requesting host adjusts

links to bypass deleted host

No changes required in

“pending changes” lists

Deleted host


Fast recovery from host failure using existing control structure

Fast Recovery from Host FailureUsing existing Control Structure

Virtual machine

Kernel “A” detects failure of Host

a. by seeing the TCP link drop

b. unable to communicate or

(reestablish) link

Kernel “A” checks its host list

and tries to establish link with

next host in the ring. Continues

around ring until successful.

Kernel A

Kernel “A” inserts delete-host

request(s) into the control ring.

Failed host


Parallel recovery from multi host failure

Parallel Recovery fromMulti-Host Failure

Virtual machine

Kernels detect failure of Hosts

a. by seeing the TCP link drop

b. unable to communicate or

(reestablish) link

Failed host

In parallel each kernel

tries to establish link with

next host in the ring.

Each kernel inserts delete-host

request(s) into the control ring.

Failed hosts


Status of harness research

Status of Harness Research

  • Working prototype

    • demonstrates a pluggable daemon and no single point of failure.

  • IceT package

    • demonstrated merging and splitting of multiple virtual machines and soft-install of different communication API (MPI and CCTL).

  • Snipe environment

    • shows the use of resource catalog to manage distributed resources.

  • PVM 3.4

    • extends heterogeneity by being able to transparently cluster Windows and Unix boxes.

  • Common Component Architecture Forum

www.epm.ornl.gov/harness


  • Login