workflow design and implementation issues in the vl e project
Download
Skip this Video
Download Presentation
Workflow design and implementation issues in the VL-e project

Loading in 2 Seconds...

play fullscreen
1 / 28

Workflow design and implementation issues in the VL-e project - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

Workflow design and implementation issues in the VL-e project. P.Adriaans A Belloum. Outline. Background The Workflow design problem Virtual Laboratory for e-Science Our approach Challenges and research lines Activities. Workflow Design: The problem. Solution 1: Incremental clustering.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Workflow design and implementation issues in the VL-e project' - maik


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Background
  • The Workflow design problem
  • Virtual Laboratory for e-Science
  • Our approach
  • Challenges and research lines
  • Activities
slide7

1700 Comparisons

3500 Comparisons

the workflow design problem
The Workflow design problem
  • A workflow is an inherent part of the problem solving heuristics
  • Induction of optimal workflows is an important research issue
  • Manipulating workflows is an important aspect of E-science
slide9

The KDD process

  • Cleaning
  • Domain consistency
  • De-duplication
  • Disambiguation

Data

selection

Enrichment

Coding

Reporting &

application

  • Data Mining
  • Clustering
  • Segmentation
  • Prediction

Information

requirements

Action

external

data

Feedback

slide10

Adaptive Information Disclosure

Formulate

query

Fire

query

Search

Construct

answer

Display

results

User support:

Alternatives

Disambiguation

Query

Expansion

Filtering

Relevance-

score

Link to

Concept tree

Data

Selection

Preprocessing

Named

Entity

Recognition

Relation

Recognition

  • Advanced
  • Constraint
  • Recognition

Validation

Version

Manage-

Ment

Ontology

Domain

selection

Ontology Learning

Information Retrieval

slide11

Application

IT

Overhead

IT

Overhead

IT

Overhead

  • Traditional position of ICT in science:
    • Application running on a single machine…
    • Little ICT overhead, no collaboration and/or
    • sharing of data and information
  • Evolving technological developments like WEB & Grid
  • and Service Oriented Architecture allow sharing of
  • data and information, thus enabling scientific
  • applications to do experiments that had not been
  • possible before…
    • Larger ICT overhead
  • e-Science is based on WEB &Grid and other application
  • supporting ICT…
    • Infrastructure will be helpful !!

Application

ICT

Overhead

slide12

Application

IT

Overhead

IT

Overhead

  • Typical e-science applications require

more than just one single resource, as well as sharing of resources

  • Moreover:
    • often resources (computing, storage, networks) are geographically distributed across different security domains building such a system:
      • introduces a large ICT overhead
      • requires extensive ICT Knowledge
  • Application scientist forced tofocus on ICT problemsrather than science
  • Recent developments in WEB&Grid based e-Science frameworks like VL-e are providing basic services which will help hiding computing resources to boost the development of data and computational intensive e-Science on a large scale distributed infrastructure.
  • Application scientist canfocus on his own sciencerather than ICT problems

Application

ICT

Overhead

slide16

Application feedback

Application

specific

service

Medical

Application

Telescience

Bio ASP

Application

Potential

Generic service

&

Virtual

Lab. services

Virtual Lab.

rapid prototyping

(interactive simulation)

Virtual Laboratory

Additional

Grid Services

(OGSA services)

Grid Middleware

Grid

&

Network

Services

Surfnet

Network Service

(lambda networking)

VL-E Experimental Environment

VL-E Proof of concept Environment

Stable Application

& VL-e component

Unstable Application

& VL-e component

Vl-E certification Environment

A set of tests that have to be passed

before any application software or VL-e component can be deployed on the VL-e proof of concept environment

mission
Mission

Effectively reuse existing workflow managements systems, and provide a generic e-Science framework for different application domains.

A generic framework can

  • Improve the reuse of workflow components and workflows in different experiments
  • Reduce the learning cost needed for learning different systems
  • Allow users to work on a consistent environment when underlying infrastructure changed
two phase approach
Two phase approach
  • Recommend suitable workflow systems for different application domains:
    • Analyze typical application use cases
    • Define small projects with different application domains
    • Review existing workflow systems
    • Recommend four workflow systems: Triana, Taverna, Kepler, and VLAMG
  • A long term
    • Extend VLAMG and develop our own generic workflow framework

Recommendation report: scientific workflow management in PoC R1 VL-e internal report, Oct 17, 2005.

lessons learned from phase 1
Lessons learned from phase 1
  • In the scientific community there are two types of workflow users: the end-users, the application developers.
  • The two categories of users have completely different requirements: easy-to-use, easy-for-developing new applications, and easy-for-migrating legacy applications
  • How to introduce a new WMS to a domain scientist?
    • Because it has a well defined architecture?
    • Or because it can allow him to keep their current work style?
  • How to reuse existing work?
    • Support multiple WMS systems or add more options to one WMS?
  • How to efficiently include user in the computing loop?

Z. Zhao et al., “Scientific workflow management: between generality and applicability”, QSIC 2005, Australia

slide20

Distributed

data sharing &

dissemination

Distributed

resources

Distributed

Parallel

computing

Visualization,

Remote resource

invocation

Computer support for problem solving

  • Problem Solving Environment: (E Gallopoulos et. al., IEEE CS Eng. 1994)
    • Organize different software components/ tools
    • Allows a user to assemble these tools at a high level of abstraction
    • Control runtime behavior of experiments
    • Examples: MATLab, Ptolemy, etc.

Scientific Workflow Management:

organize and execute on grid enabled resources!

Traditional PSE:

organize and execute resources locally!

diversity in swms
Diversity in SWMS
  • Taverna:
  • Web services based language: Scufl;
  • FreeFluo: engine
  • Graphical viz of workflow
  • Triana:
  • Components
  • Task graph
  • Data/control flow
  • Kepler:
  • Actor,director
  • MoML
  • Execution models
  • Pegasus:
  • Based on DAGMan
  • VDL
  • DAG

  • DAGMan:
  • Computing tasks
  • DAG
a workflow bus paradigm
A workflow bus paradigm

Workflow bus

Z. Zhao et al., “Workflow bus for e-Science”, to appear IEEE e-Science 2006, Amsterdam

ws vlam engine architecture
ws-VLAM Engine: architecture

Service host(s) and compute element(s)

GT4 Java Container

Job functions

GRAM

services

ws-RTSM

Factory

pre-ws-GRAM

Client

ws-RTSM

Instance

Worker

nodes

Delegate

Delegation

service

Workflow

components

GRAM

Ws-RTSM

Instance

Client

Delegation

Service

ws-RTSM

Factory

on going work
On going work
  • Objective:
    • Invoke ws-VLAM RTSM GT4 service from kepler/Taverna environment to execute a predefined Application workflow.
  • ws-VLAM Application workflow:
    • Scientific experiments composed of software components that need to be executed on Grid-enabled resources (CPU intensive)
    • Potential VLAM Application workflow can be described as:
      • a Pipeline of processes exchanging streams of data.
execute the ws vlam workflow in kepler taverna
Execute the ws-VLAM workflow in Kepler/Taverna
  • A predefined Application workflow developed in VLAM can be executed as a single step in Kepler/Taverna
    • (no need to recompose graphically the whole workflow).
  • The predefined Application workflow will be executed on any remote computing resource where the VLAM-RTSM GT4 Web service is installed.
  • Advantages:
    • Compose workflow where sub-workflows (which require grid resources) are executed on grid-enabled resources, while the rest of the workflow is either executed using other Kepler actors or taverna processors
    • It is also more efficient, since it avoid the overhead which will result by wrapping every workflow component as a separated web service or a separate remote grid-execution.
execute the ws vlam workflow in kepler taverna1
Execute the ws-VLAM workflow in Kepler/Taverna

Kepler/Taverna workbench

RTSM-GT4 Web service

(Available on DAS2 )

Das2 or PoC facilities.

GT4 Java Container

GRAM

services

(2) Service

Invocation

ws-RTSM

Factory

pre-ws-GRAM

VLAM Actor

or

Taverna processor

(To be developed)

RTSM Client

ws-RTSM

Instance

Worker

nodes

Workfow

Description

(XML)

(1) Proxy

Delegate

Delegation

service

Workflow

components

  • Kepler/Taverna users can have access to some of the parameters of the Application workflow to change the default values
  • Kepler/Taverna users have to specify the location of the input data file as URL and will get back a URL if the workflow generates data files
  • Graphical output of the Application workfloware handled automatically by the VLAM Taverna processor /Kepler actor.
research scope and lines
Research scope and lines
  • Focus 1: Interoperability and integration between workflow systems
  • Focus 2: Composition of meta workflows
  • Focus 3: Provenance at meta workflows
  • Focus 4: Enactment and orchestration of meta workflows
  • Focus 5: Human in the loop computing in meta workflows

Z. Zhao, A. Belloum, M. Bubark: A research plan of VL-e SP2.5 V0.2 September 9, 1006

ad