1 / 18

DIANE Overview

DIANE Overview. DIANE Overview. Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid. DIANE Overview.

olaf
Download Presentation

DIANE Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DIANE Overview DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid

  2. DIANE Overview • DIANE is a lightweight distributed framework for parallel scientific applications in master-worker model. It assumes that a job may be split into a number of independent tasks which is a typical case in many scientific application. • As opposed to standard message passing libraries such as MPI, the DIANE framework takes care of all synchronization, communication and workflow management details on behalf of the application. The execution of a job is fully controlled by the framework which decides when and where the tasks are executed. • DIANE is a thin software layer which easily works on top of more fundamental middleware such as LSF, PBS or the Grid Resource Brokers. It may also work in a standalone mode and does not require any complex underlying software.

  3. DIANE Overview • Main Features and Design Principles • The big picture

  4. DIANE Overview • Master-Worker Workflow Model • DIANE is based on pull model - workers ask for tasks to the master. Master decides how to assign tasks to workers and user may optimize this process for a particular application.

  5. DIANE Overview • Active feedback versus batch operation mode • Typically End User interacts with the GRID using some sort of User Interface. User Interface may be as simple as a set of command line tools or more complicated GUI based application which contains modules to prepare and monitor jobs (Application and Job Handler respectively).

  6. DIANE Overview • In the active feedback operation mode each Worker pull for new subjobs when it becomes available. Fast feedback to Job Master allows interactive work for the end user.

  7. DIANE Overview • Core framework • DIANE core framework does not depend on any concrete application (in particular any data analysis software) and is explicitly designed in such a way that application specific parts are implemented as a separate component. Core framework is implemented in python running CORBA in the backend in a way completely transparent for applications. • Supported languages for applications • C++ and python application components are supported directly and may be configured at runtime according to different usage scenarios (as threads or separate processes). Application written in any language in a form of executable file (FORTRAN, Java) may also be used. • Error Recovery • Users may specify customized error recovery policies if needed. A set of default policies is provided and may be used immediately. User may easily write and add special recovery policies by implementing simple python functions

  8. DIANE Overview • Job Monitoring and Outbound Connectivity from Worker Nodes • Remote client (user) gets full information about the state of a job and may connect and disconnect at any time. Administrator may set up any number of proxies between Client and Master so outbound connectivity from worker nodes is not required. In this way DIANE may be very easily adapted to local policies of computing centers. • Example: each of the commands below is executed on a different machine. Connecting remote client directly to job master: % diane.startmaster --job=test # cluster % diane.startclient --job=test # end user Connecting remote client through a proxy: % diane.startmaster --job=test # cluster % diane.startclient --proxy # proxy on a gateway machine of the cluster % diane.startclient --job=test # end user

  9. DIANE Overview • Simple single-user job execution on a local cluster • User using LSF at CERN may start a new job running master on his local desktop machine while submitting workers as individual jobs to LSF: • % diane.startjob --job=test --workers=30 --broker=LSF --broker-options=-q8nm • Software building blocks • Master/Worker components may be arranged in a variaty of way to build more sophistcated systems or to integrate into existing frameworks.

  10. DIANE Overview • How DIANE fits into the GRID picture • DIANE runs on top of low-level GRID services.

  11. DIANE Overview • Quick start • JobInitData contains application specific parameters for 'crashTest' application, which is used to simulate application failures in different time patterns. Here we will use it to make sure everything was installed correctly. diane.startjob -j $DIANE_TOP/dev/workspace/testOK.job -w2@localhost --wms=xterm DIANE: 22:18:39: Initializing: appname = crashTestDIANE: 22:18:39: starting new job: id = 2DIANE: 22:18:39: number of registered workers = 0DIANE: 22:18:39: client running...[<function app_ok at 0x8294904>, 5.8641818780727872][<function app_ok at 0x8294904>, 10.35566135792468][<function app_ok at 0x8294904>, 11.051037211240827][<function app_ok at 0x8294904>, 10.967285308043389][<function app_ok at 0x8294904>, 9.5686214756534991][<function app_ok at 0x8294904>, 4.4414560806191457][<function app_ok at 0x8294904>, 11.219275775397689][<function app_ok at 0x8294904>, 8.9302782987551801][<function app_ok at 0x8294904>, 11.908280602567558][<function app_ok at 0x8294904>, 9.5101635521295869]DIANE: 22:18:39: job plan: #10 tasks<thread:JobControl>: 22:18:39: current job processing time: 0 s

  12. DIANE Overview • At the same time 2 xterminal windows should pop-up automatically: you will see the worker immediately put to work by the master and tasks succesively dispatched, executed and integrated. switching to current user job workspace: /home/moscicki/diane.workspace/jobs/2 DIANE: 22:19:41: reading master address from the default file: MasterOID DIANE: 22:19:42: registering new worker with wid = 1 worker: 22:19:42: initializing job 2, worker id 1 worker: 22:19:42: job initialization finished with the status: ok <thread:JobControl>: 22:19:43: dispatching taskid=1 to worker wid=1 worker: 22:19:43: starting task #1 doing action: <function app_ok at 0x82d8294> sleeping: 5.8641818780727872 worker: 22:19:48: task 1 finished with the status: ok DIANE: 22:19:48: recieved result, taskid =1 status: ok from worker: 1 integrating result... waiting...5.8641818780727872 DIANE: 22:19:54: Integrated result successfully... <thread:JobControl>: 22:19:54: dispatching taskid=2 to worker wid=1 worker: 22:19:54: starting task #2 doing action: <function app_ok at 0x82d8294> sleeping: 10.35566135792468 worker: 22:20:05: task 2 finished with the status: ok

  13. DIANE Overview • At the end of job execution you should see output like this: <thread:JobControl>: 22:22:55: job completed ok, quitting control loop DIANE: 22:22:55: notifying workers about finished job DIANE: 22:22:55: deactivating master worker: 22:22:55: notification from master: job 2 finished worker: 22:22:55: worker cleanup status: ok DIANE: 22:22:55: Trying to terminate server... DIANE: 22:22:55: notifying client Job terminated, id= 2 Summary = DIANE: 22:22:55: Trying to terminate server... DIANE: 22:22:55: job output in: /home/moscicki/diane.workspace/jobs/2

  14. DIANE Overview • You can construct applications by creating Planner, Integrator and Worker objects in Python language. • You decide what data structures are exchanged between these objects. • More examples may be found in $DIANE_TOP/dev/applications.

  15. DIANE Overview class Planner: def env_createPlan(self, jobData, chunkNum): init_list = [] import random random.seed(jobData[1]) prob = jobData[2] avg_wait = jobData[3] std_dev = jobData[4] # ... for i in range(jobData[0]): action = random.choice(failures.values()) init_list.append([action,random.gauss(avg_wait, std_dev)]) return (None,init_list)

  16. DIANE Overview class Integrator: def env_init(self,job_data): pass def env_addPartialOutput(self, wait): print "integrating result... waiting..."+`wait` if wait>0: time.sleep(wait) return 1 def env_getResult(self): return None

  17. DIANE Overview class Worker: def env_init(self, init_data): return 1 def env_performWork(self, what): action = what[0] wait = what[1] print "doing action: " + str(action) + " sleeping: " + `wait` if wait > 0: time.sleep(wait) return action(wait) def env_done(self): return 1

  18. DIANE Overview • DIANE • Is free software under the GPL license • You can download at: • http://ganga.web.cern.ch/

More Related