Gluepy a simple distributed python programming framework for complex grid environments
Download
1 / 40

gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments. 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura The University of Tokyo. Barriers of Grid Environments. Grid = Multiple Clusters (LAN/WAN) Complex environment Dynamic node joins

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments' - zenda


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Gluepy a simple distributed python programming framework for complex grid environments l.jpg

gluepy:A Simple Distributed Python Programming Framework for Complex Grid Environments

8/1/08

Ken Hironaka, Hideo Saito,

Kei Takahashi, KenjiroTaura

The University of Tokyo

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Barriers of grid environments l.jpg
Barriers of Grid Environments

  • Grid = Multiple Clusters (LAN/WAN)

  • Complexenvironment

    • Dynamic node joins

    • Resource removal/failure

      • Network and nodes

    • Connectivity

      • NAT/firewall

Fire Wall

leave

Grid enabled frameworks are crucial to facilitate computing in these environments

join

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


What type of applications l.jpg
What type of applications?

  • Typical Usage

    • Standalone jobs

    • No interaction among nodes

  • Parallel and distributed Applications

    • Orchestrate nodes for a single application

      • Map an existing application on the Grid

    • Requires complex interaction

      ⇒frameworks must make it

      simple and manageable

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Common approaches 1 l.jpg
Common Approaches(1)

execute

  • Programming-less

    • Batch Scheduler

      • Task placement (inter-cluster)

      • Transparent retries on failure

    • Enables minimal interaction

      • Pass data via files/raw sockets

      • Embarrassingly parallel tasks

      • Very limited for application

SUBMIT

redo

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Common approaches 2 l.jpg
Common Approaches(2)

  • Incorporate some user programming

    • e.g.:Master-Worker framework

      • Program the master/worker(s)

        • Job distribution

        • Handling worker join/leave

        • Error handling

  • Enables simpleinteraction

    • Still limited in application

doJob()

error()

join()

For more complex interaction (larger problem set)

must allow more flexible/generalprogramming

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


The most flexible approach l.jpg
The most flexible approach

  • Parallel Programming Languages

    • Extend existing languages: retains flexibility

    • Countless past examples

      • (MultiLisp[Halstead ‘85], JavaRMI, ProActive[Huet et al. ‘04], …)

    • Problem:not in context of the Grid

      • Node joins/leaves?

      • Resolve connectivity with NAT/firewall?

    • Coding becomes complex/overwhelming

Can we not complement this?

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Our contribution l.jpg
Our Contribution

  • Grid-enableddistributed object-oriented framework

    • a focus on coping with complex environment

      • Joins, failures, connectivity

    • simpleProgramming& minimalConfiguration

      • Simple tool to act as a glue for the Grid

    • Implemented parallel applications on Grid environment with 900cores (9clusters)

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Agenda l.jpg
Agenda

  • Introduction

  • Related Work

  • Proposal

  • Evaluation

  • Conclusion

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Programming less frameworks l.jpg
Programming-less frameworks

  • Condor/DAGMan [Thain et al. ‘05]

    • Batch scheduler

    • Transparent retires/ handle multiple clusters

    • Extremely limited interaction among nodes

      • Tasks with DAG dependencies

      • Pass on data using intermediate/scratch files

Task

Interaction using files

Central Manager

Assign

Busy Nodes

Cluster

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Restricted programming frameworks l.jpg
“Restricted” Programming frameworks

  • Master-Worker Model: Jojo2 [Aoki et al. ‘06], OmniRPC [Sato et al. ‘01],

    Ninf-C [Nakata et al. ‘04], NetSolve [Casanova et al. ‘96]

    • Event driven master code: handle join/leave

  • Map-Reduce [Dean et al. ‘05]

    • define 2 functions: map(), reduce()

    • Partial retires when nodes fail

  • Ibis – Satin [Wrzesinska et al. ‘06]

    • Distributed divide-and-conquer

    • Random work stealing: accommodate join/leave

  • Effective for specialized problem sets

    • Specialize on a problem/model, made mapping/programming easy

    • For “unexpected models”, users have to resort to out-of-band/Ad-hoc means

Join Handler

Failure Handler

Join

fib(n)

Map()

divide

Reduce()

fib(n-1)

Map()

Reduce()

Input Data

Map()

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Distributed object oriented frameworks l.jpg
Distributed Object Oriented frameworks

foo.doJob(args)

  • ABCL [Yonezawa ‘90]

    JavaRMI, Manta [Maassen et al. ‘99]

    ProActive [Huet et al. ‘04]

  • Distributed Object oriented

    • Disperse objects among resources

  • Load delegation/distribution

    • Method invocations

    • RMI (Remote Method Invocation)

    • Async. RMIs for parallelism

  • RMI:

    • good abstraction

  • Extension of general language:

    • Allow flexible coding

compute

RMI

foo

Async. RMI

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Hurdles for doo on the grid l.jpg
Hurdles for DOO on the Grid

  • Race conditions

    • Simultaneous RMIs on 1 object

    • Active Objects

      • 1 object = 1 thread

      • Deadlocks:

        e.g.: recursive calls

  • Handling asynchronous events

    • e.g., handling node joins

    • Why not event driven?

      • The flow of the program is segmented, and hard to flow

  • Handling joins/failures

    • Difficult to handle them transparently in a reasonable manner

deadlock

b.f()

b

a.g()

a

if join: add

if done: give more

event

Checkpoint?

Automatic retry?

failure

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Hurdles for implementation l.jpg
Hurdles for Implementation

NAT

  • Connecivity with NAT/firewall

    • Solution: Build an overlay

  • Existing implementations

    • ProActive [Huet et al. ‘04]

      • Tree topology overlay

      • User must hand write connectable points

    • Jojo2[Aoki et al. ‘06]

      • 2-level Hierarchical topology

        • SSH / UDP broadcast

      • assumes network topology/setting

        • out of user control

  • Requirements

    • Minimal user burden

Configure each link

Firewall

Connection

Configuration File

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Summarization of the problems l.jpg
Summarization of the Problems

  • Distributed Object-Oriented on the Grid

    • Thread race conditions

    • Event handling

    • Node join/leave

    • underlying Connectivity

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Proposal gluepy l.jpg
Proposal: gluepy

  • Grid enabled distributed object oriented framework

    • As a PythonLibrary

    • glue together Grid resources via simple and flexible coding

  • Resolve the issues in an object-oriented paradigm

    • SerialObjects

      • define “ownership” for objects

      • blocking operations unblock on events

    • Constructs for handling Node join/leave

      • Resolve the “first reference” problem

      • Failures are abstracted as exceptions

    • Connectivity(NAT/firewall)

      • Peers automatically construct an overlay

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


The basic programming model l.jpg
The Basic Programming Model

Proc: A

Proc: B

  • RemoteObjects

    • Created/mapped to a process

    • Accessible from other processes (RMI)

    • Passive Objects

      • Threads are not bound to objects

  • Thread

    • Simply to gain parallelism

    • RMIs / async. invocations (RMIs) implicitly spawn a thread

  • Future

    • Returned for async. invocation

    • placeholder for result

    • Uncaught exception is stored

      and re-raised at collection

a

Spawn for RMI

a.f()

f()

Proc

a

Spawn for async

F = a.f() async

f()

store in F

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Programming in gluepy l.jpg
Programming in gluepy

inherit Remote Object

  • Basics: RemoteObject

    • Inherit Base class

    • Externally referenceable

  • Async. invocation with futures

    • No explicit threads

    • Easier to maintain

      sequential flow

  • mutual exclusion? events?

    ⇒ SerialObjects

class Peer(RemoteObject):

def run(self, arg):

# work here…

return result

futures = []

for p in peers:

f = p.run.future(arg)

futures.append(f)

waitall(futures)

for f in futures:

print f.get()

async. RMI

run() on all

wait forallresults

read forallresults

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Ownership with serialobjects l.jpg
“ownership” with SerialObjects

waiting threads

owner

thread

object

  • SerialObjects

    • Objects with mutual exclusion

    • RemoteObjectsub-class

  • No explicit locks

  • Ownership for each object

    • call ⇒ acquire

    • return ⇒ release

    • Method execution by only 1 thread

      • The “owner thread”

  • Owner releases ownership on

    blocking operations

    • e.g: waitall(), RMI to other SerialObject

    • Pending threads contest for ownership

    • Arbitrary thread is scheduled

    • Eliminate deadlocks for recursive calls

Th

Th

Th

Th

new

owner

thread

object

Th

Th

Th

block

Give-up

Owner

ship

Th

re-contest

for ownership

object

Th

Th

Th

Th

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

unblock


Signals to serialobjects l.jpg
Signals to SerialObjects

  • We don’t want event-driven loops!

  • Events → “signals”

    • Blockingop. unblock on signal

  • Signals to objects

    • Unblock a thread blocking

      in object’s context

      • If none, unblock a next blocking thread

    • Unblocked thread can handle

      the signal(event)

object

SIGNAL

Th

unblock

handle

object

Th

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Serialobjects in gluepy l.jpg
SerialObjects in gluepy

class DistQueue(SerialObject):

def __init__(self):

self.queue = []

def add(self, x):

self.queue.append(x)

if len(self.queue) == 1:

self.signal()

def pop(self):

while len(self.queue) == 0:

wait([])

x = self.queue.pop(0)

return x

  • e.g.:A Queue

    • pop()

      • blocks on empty Queue

    • add()

      • call signal() to unblock waiter

  • Atomic Section:

    • Between blocking ops

      in a method

    • Can update obj. attr.s

      and do invocation on

      Non-Serial Objects

Atomic Section

Signal & wake

Block until signal

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Managing dynamic resources l.jpg
Managing dynamic resources

Objects in

computation

  • Node Join:

    • Python process starts

  • Node leave:

    • Process termination

  • Constructs for node joins/leaves

    • Node Join

      ⇒“first reference” problem

      Object lookup

      • obtain ref. to existing objects in computation

    • Node Leave

      ⇒ RMI exception

      • Catch to handle failure

lookup

joining node

Exception!

Object on failed node

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


E g master worker in gluepy 1 3 l.jpg
e.g.:Master-worker in gluepy (1/3)

class Master(SerialObject):

...

def nodeJoin(self, node):

self.nodes.append(node)

self.signal()

def run (self):

assigned = {}

while True:

while len(self.nodes)>0 and

len(self.jobs)>0:

ASYNC. RMIS TO IDLE WORKERS

readys = wait(futures)

if readys == None: continue

for f in readys:

HANDLE RESULTS

  • Handles join/leave

  • code for join:

    • join will invoke signal

    • signal will unblock main

      master thread

Signal for join

Block &

Handle join

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


E g master worker in gluepy 2 3 l.jpg
e.g. :Master-worker in gluepy (2/3)

for f in readys:

node, job = assigned.pop(f)

try:

print ”done:”, f.get()

self.nodes.append(node)

except RemoteException, e:

self.jobs.append(job)

  • Failure handling

    • Exception on collection

    • Handle exception to resubmit task

Failure

handling

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


E g master worker in gluepy 3 3 l.jpg
e.g.: Master-worker in gluepy (3/3)

  • Deployment

    • Master exports object

    • Workers get reference

      and do RMI to join

Master init

master = Master()

master.register(“master”)

master.run()

Worker init

worker = Worker()

master = RemoteRef(“master”)

master.nodeJoin(worker)

while True:

sleep(1)

lookup on join

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Automatic overlay construction 1 l.jpg
Automatic Overlay Construction(1)

  • Solution for Connectivity

    • Automatically construct

      an overlay

  • TCP overlay

    • On boot, acquire other peer info.

    • Each node connects to a small number of peers

    • Establish a connected connection graph

NAT

Global IP

Firewall

Attempt connection

established connections

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Automatic overlay construction 2 l.jpg
Automatic Overlay Construction(2)

  • Firewalled clusters

    • Automatic

      port-forwarding

    • User configure SSH info

  • Transparent routing

    • P-P communication is routed

    • (AODV [Perkins ‘97])

Firewall

traversal

SSH

#config file

use src_patdst_pat, prot=ssh, user=kenny

P-to-P

communication

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Rmi failure detection on overlay l.jpg
RMI failure detection on Overlay

RMI handler

  • Problem with overlay

    • A route consists of a number of connections

  • RMI failure

    ⇒ failure of any intermediate

    connection

  • Path Pointers

    • Recorded on each forwarding node

    • RMI replyreturns the path it came

  • Failure of intermediate connection

    • The preceding forwarding node back-propagates the failure

Path pointer

RMI invoker

Backpropagate

failure

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Agenda28 l.jpg
Agenda

  • Introduction

  • Related Work

  • Proposal

  • Evaluation

  • Conclusion

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Experimental environment l.jpg
Experimental Environment

InTrigger Grid Platform in Japan

Max. scale:9clusters, over 900 cores

requires

SSH forwarding

Global IPs

istbs:316

tsubame:64

mirai:48

okubo:28

hongo:98

All packets dropped

hiro:88

chiba:186

kyoto:70

suzuk:72

InTrigger

imade:60

kototoi:88

Private IPs

Firewall

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Necessary configuration l.jpg
Necessary Configuration

  • Configuration necessary for Overlay

    • 2clusters( tsubame, istbs) require SSH-portforwarding to other clusters

      ⇒ 2 lines of configuration

add connection instruction by regular expression

# istbs cluster uses SSH for inter-cluster conn.

use 133\.11\.23\. (?!133\.11\.23\.), prot=ssh, user=kenny

#tsubame cluster gateway uses SSH for inter-cluster conn.

use 131.112.3.1 (?!172\.17\.), prot=ssh, user=kenny

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Overlay construction simulation l.jpg
Overlay Construction Simulation

  • Evaluate the overlay construction scheme

  • For different cluster configurations, modified number of attempted connections per peer

  • 1000 trials per each cluster/attempted connection configuration

28 Global/ 238 Private Peers Case: 95 %

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Dynamic master worker l.jpg
Dynamic Master-Worker

  • Master object distributes work to Worker objects

    • 10,000tasksasRMI

  • Workers repeat join/leave

    • Tasks for failed nodes are redistributed

    • No tasks were lost during the experiment

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


A real life application l.jpg
A Real-life Application

  • A combination optimization problem

    • Permutation Flow Shop Problem

    • parallelbranch-and-bound

      • Master-Worker like

      • Requires periodic exchange of bounds

    • Code

      • 250 lines of Python code as glue code

      • Worker node starts up sequential C++ code

        • Communicate with local Python through pipes

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Master worker interaction l.jpg
Master-Workerinteraction

  • Master does RMI to worker

    • Worker: periodical RMI to master

    • Not your typical master-worker

    • requires a flexible framework like ours

Master

exchange_bound()

doJob()

Worker

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Performance l.jpg
Performance

  • Work Rate

    • ci : total comp. time per core

    • N: num. of cores

    • T: completion time

  • Slight drop with 950 cores

    • due to master node becoming overloaded

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Troubleshoot search engine l.jpg
Troubleshoot Search Engine

  • Ever stuck debugging, or troubleshooting?

  • Re-rank query results obtained from google

    • Use results from machine learning web-forums

    • Perform natural language processing on page contents

      at query time

  • Use a Grid backend

    • Computationally intensive

    • Require good response time

      • in 10s of seconds

Compute!!

Compute!!

backend

Query:

“vmware kernel panic”

Search Engine

Compute!!

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Troubleshoot search engine overview l.jpg
Troubleshoot Search Engine Overview

async.

doQuery()

Graph extraction

Python

CGI

doSearch()

rescoring

parsing

async.

doWork()

Leveraged sync/async RMIs to seamlessly integrate parallelism into a sequential program.

Merged CGIs with Grid backend

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Agenda38 l.jpg
Agenda

  • Introduction

  • Related Work

  • Proposal

  • Evaluation

  • Conclusion

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Conclusion l.jpg
Conclusion

  • gluepy: Grid enabled distributed object oriented framework

    • Supports simple and flexible coding for complex Grid

      • SerialObjects

      • Signal semantics

      • Object lookup / exception on RMI failure

      • Automatic overlay construction

    • as a tool to glue together Grid resources simply and flexibly

  • Implemented and evaluated applications on the Grid

    • Max. scale: 900core (9 cluster)

      • NAT/Firewall, with runtime joins/leaves

    • Parallelized real-life applications

      • Take full advantage of gluepy constructs for seamless programming

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


Questions l.jpg
Questions?

  • gluepy is available from its homepage

    www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy


ad