Workflow design and implementation issues in the vl e project
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

Workflow design and implementation issues in the VL-e project PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

Workflow design and implementation issues in the VL-e project. P.Adriaans A Belloum. Outline. Background The Workflow design problem Virtual Laboratory for e-Science Our approach Challenges and research lines Activities. Workflow Design: The problem. Solution 1: Incremental clustering.

Download Presentation

Workflow design and implementation issues in the VL-e project

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Workflow design and implementation issues in the vl e project

Workflow design and implementation issues in the VL-e project

P.Adriaans A Belloum


Outline

Outline

  • Background

  • The Workflow design problem

  • Virtual Laboratory for e-Science

  • Our approach

  • Challenges and research lines

  • Activities


Workflow design the problem

Workflow Design: The problem


Solution 1 incremental clustering

Solution 1: Incremental clustering


Solution 2 feature analysis

Solution 2: Feature analysis


Workflow design and implementation issues in the vl e project

1700 Comparisons

3500 Comparisons


The workflow design problem

The Workflow design problem

  • A workflow is an inherent part of the problem solving heuristics

  • Induction of optimal workflows is an important research issue

  • Manipulating workflows is an important aspect of E-science


Workflow design and implementation issues in the vl e project

The KDD process

  • Cleaning

  • Domain consistency

  • De-duplication

  • Disambiguation

Data

selection

Enrichment

Coding

Reporting &

application

  • Data Mining

  • Clustering

  • Segmentation

  • Prediction

Information

requirements

Action

external

data

Feedback


Workflow design and implementation issues in the vl e project

Adaptive Information Disclosure

Formulate

query

Fire

query

Search

Construct

answer

Display

results

User support:

Alternatives

Disambiguation

Query

Expansion

Filtering

Relevance-

score

Link to

Concept tree

Data

Selection

Preprocessing

Named

Entity

Recognition

Relation

Recognition

  • Advanced

  • Constraint

  • Recognition

Validation

Version

Manage-

Ment

Ontology

Domain

selection

Ontology Learning

Information Retrieval


Workflow design and implementation issues in the vl e project

Application

IT

Overhead

IT

Overhead

IT

Overhead

  • Traditional position of ICT in science:

    • Application running on a single machine…

    • Little ICT overhead, no collaboration and/or

    • sharing of data and information

  • Evolving technological developments like WEB & Grid

  • and Service Oriented Architecture allow sharing of

  • data and information, thus enabling scientific

  • applications to do experiments that had not been

  • possible before…

    • Larger ICT overhead

  • e-Science is based on WEB &Grid and other application

  • supporting ICT…

    • Infrastructure will be helpful !!

Application

ICT

Overhead


Workflow design and implementation issues in the vl e project

Application

IT

Overhead

IT

Overhead

  • Typical e-science applications require

    more than just one single resource, as well as sharing of resources

  • Moreover:

    • often resources (computing, storage, networks) are geographically distributed across different security domains building such a system:

      • introduces a large ICT overhead

      • requires extensive ICT Knowledge

  • Application scientist forced tofocus on ICT problemsrather than science

  • Recent developments in WEB&Grid based e-Science frameworks like VL-e are providing basic services which will help hiding computing resources to boost the development of data and computational intensive e-Science on a large scale distributed infrastructure.

  • Application scientist canfocus on his own sciencerather than ICT problems

Application

ICT

Overhead


Workflow design and implementation issues in the vl e project

Application feedback

Application

specific

service

Medical

Application

Telescience

Bio ASP

Application

Potential

Generic service

&

Virtual

Lab. services

Virtual Lab.

rapid prototyping

(interactive simulation)

Virtual Laboratory

Additional

Grid Services

(OGSA services)

Grid Middleware

Grid

&

Network

Services

Surfnet

Network Service

(lambda networking)

VL-E Experimental Environment

VL-E Proof of concept Environment

Stable Application

& VL-e component

Unstable Application

& VL-e component

Vl-E certification Environment

A set of tests that have to be passed

before any application software or VL-e component can be deployed on the VL-e proof of concept environment


Mission

Mission

Effectively reuse existing workflow managements systems, and provide a generic e-Science framework for different application domains.

A generic framework can

  • Improve the reuse of workflow components and workflows in different experiments

  • Reduce the learning cost needed for learning different systems

  • Allow users to work on a consistent environment when underlying infrastructure changed


Two phase approach

Two phase approach

  • Recommend suitable workflow systems for different application domains:

    • Analyze typical application use cases

    • Define small projects with different application domains

    • Review existing workflow systems

    • Recommend four workflow systems: Triana, Taverna, Kepler, and VLAMG

  • A long term

    • Extend VLAMG and develop our own generic workflow framework

Recommendation report: scientific workflow management in PoC R1 VL-e internal report, Oct 17, 2005.


Lessons learned from phase 1

Lessons learned from phase 1

  • In the scientific community there are two types of workflow users: the end-users, the application developers.

  • The two categories of users have completely different requirements: easy-to-use, easy-for-developing new applications, and easy-for-migrating legacy applications

  • How to introduce a new WMS to a domain scientist?

    • Because it has a well defined architecture?

    • Or because it can allow him to keep their current work style?

  • How to reuse existing work?

    • Support multiple WMS systems or add more options to one WMS?

  • How to efficiently include user in the computing loop?

Z. Zhao et al., “Scientific workflow management: between generality and applicability”, QSIC 2005, Australia


Workflow design and implementation issues in the vl e project

Distributed

data sharing &

dissemination

Distributed

resources

Distributed

Parallel

computing

Visualization,

Remote resource

invocation

Computer support for problem solving

  • Problem Solving Environment: (E Gallopoulos et. al., IEEE CS Eng. 1994)

    • Organize different software components/ tools

    • Allows a user to assemble these tools at a high level of abstraction

    • Control runtime behavior of experiments

    • Examples: MATLab, Ptolemy, etc.

Scientific Workflow Management:

organize and execute on grid enabled resources!

Traditional PSE:

organize and execute resources locally!


Diversity in swms

Diversity in SWMS

  • Taverna:

  • Web services based language: Scufl;

  • FreeFluo: engine

  • Graphical viz of workflow

  • Triana:

  • Components

  • Task graph

  • Data/control flow

  • Kepler:

  • Actor,director

  • MoML

  • Execution models

  • Pegasus:

  • Based on DAGMan

  • VDL

  • DAG

  • DAGMan:

  • Computing tasks

  • DAG


A workflow bus paradigm

A workflow bus paradigm

Workflow bus

Z. Zhao et al., “Workflow bus for e-Science”, to appear IEEE e-Science 2006, Amsterdam


Ws vlam engine architecture

ws-VLAM Engine: architecture

Service host(s) and compute element(s)

GT4 Java Container

Job functions

GRAM

services

ws-RTSM

Factory

pre-ws-GRAM

Client

ws-RTSM

Instance

Worker

nodes

Delegate

Delegation

service

Workflow

components

GRAM

Ws-RTSM

Instance

Client

Delegation

Service

ws-RTSM

Factory


On going work

On going work

  • Objective:

    • Invoke ws-VLAM RTSM GT4 service from kepler/Taverna environment to execute a predefined Application workflow.

  • ws-VLAM Application workflow:

    • Scientific experiments composed of software components that need to be executed on Grid-enabled resources (CPU intensive)

    • Potential VLAM Application workflow can be described as:

      • a Pipeline of processes exchanging streams of data.


Execute the ws vlam workflow in kepler taverna

Execute the ws-VLAM workflow in Kepler/Taverna

  • A predefined Application workflow developed in VLAM can be executed as a single step in Kepler/Taverna

    • (no need to recompose graphically the whole workflow).

  • The predefined Application workflow will be executed on any remote computing resource where the VLAM-RTSM GT4 Web service is installed.

  • Advantages:

    • Compose workflow where sub-workflows (which require grid resources) are executed on grid-enabled resources, while the rest of the workflow is either executed using other Kepler actors or taverna processors

    • It is also more efficient, since it avoid the overhead which will result by wrapping every workflow component as a separated web service or a separate remote grid-execution.


Execute the ws vlam workflow in kepler taverna1

Execute the ws-VLAM workflow in Kepler/Taverna

Kepler/Taverna workbench

RTSM-GT4 Web service

(Available on DAS2 )

Das2 or PoC facilities.

GT4 Java Container

GRAM

services

(2) Service

Invocation

ws-RTSM

Factory

pre-ws-GRAM

VLAM Actor

or

Taverna processor

(To be developed)

RTSM Client

ws-RTSM

Instance

Worker

nodes

Workfow

Description

(XML)

(1) Proxy

Delegate

Delegation

service

Workflow

components

  • Kepler/Taverna users can have access to some of the parameters of the Application workflow to change the default values

  • Kepler/Taverna users have to specify the location of the input data file as URL and will get back a URL if the workflow generates data files

  • Graphical output of the Application workfloware handled automatically by the VLAM Taverna processor /Kepler actor.


Research scope and lines

Research scope and lines

  • Focus 1: Interoperability and integration between workflow systems

  • Focus 2: Composition of meta workflows

  • Focus 3: Provenance at meta workflows

  • Focus 4: Enactment and orchestration of meta workflows

  • Focus 5: Human in the loop computing in meta workflows

Z. Zhao, A. Belloum, M. Bubark: A research plan of VL-e SP2.5 V0.2 September 9, 1006


Workflow design and implementation issues in the vl e project

http://www.vl-e.nl/


  • Login