workflow design and implementation issues in the vl e project n.
Download
Skip this Video
Download Presentation
Workflow design and implementation issues in the VL-e project

Loading in 2 Seconds...

play fullscreen
1 / 28

Workflow design and implementation issues in the VL-e project - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

Workflow design and implementation issues in the VL-e project. P.Adriaans A Belloum. Outline. Background The Workflow design problem Virtual Laboratory for e-Science Our approach Challenges and research lines Activities. Workflow Design: The problem. Solution 1: Incremental clustering.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Workflow design and implementation issues in the VL-e project' - maik


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Background
  • The Workflow design problem
  • Virtual Laboratory for e-Science
  • Our approach
  • Challenges and research lines
  • Activities
slide7

1700 Comparisons

3500 Comparisons

the workflow design problem
The Workflow design problem
  • A workflow is an inherent part of the problem solving heuristics
  • Induction of optimal workflows is an important research issue
  • Manipulating workflows is an important aspect of E-science
slide9

The KDD process

  • Cleaning
  • Domain consistency
  • De-duplication
  • Disambiguation

Data

selection

Enrichment

Coding

Reporting &

application

  • Data Mining
  • Clustering
  • Segmentation
  • Prediction

Information

requirements

Action

external

data

Feedback

slide10

Adaptive Information Disclosure

Formulate

query

Fire

query

Search

Construct

answer

Display

results

User support:

Alternatives

Disambiguation

Query

Expansion

Filtering

Relevance-

score

Link to

Concept tree

Data

Selection

Preprocessing

Named

Entity

Recognition

Relation

Recognition

  • Advanced
  • Constraint
  • Recognition

Validation

Version

Manage-

Ment

Ontology

Domain

selection

Ontology Learning

Information Retrieval

slide11

Application

IT

Overhead

IT

Overhead

IT

Overhead

  • Traditional position of ICT in science:
    • Application running on a single machine…
    • Little ICT overhead, no collaboration and/or
    • sharing of data and information
  • Evolving technological developments like WEB & Grid
  • and Service Oriented Architecture allow sharing of
  • data and information, thus enabling scientific
  • applications to do experiments that had not been
  • possible before…
    • Larger ICT overhead
  • e-Science is based on WEB &Grid and other application
  • supporting ICT…
    • Infrastructure will be helpful !!

Application

ICT

Overhead

slide12

Application

IT

Overhead

IT

Overhead

  • Typical e-science applications require

more than just one single resource, as well as sharing of resources

  • Moreover:
    • often resources (computing, storage, networks) are geographically distributed across different security domains building such a system:
      • introduces a large ICT overhead
      • requires extensive ICT Knowledge
  • Application scientist forced tofocus on ICT problemsrather than science
  • Recent developments in WEB&Grid based e-Science frameworks like VL-e are providing basic services which will help hiding computing resources to boost the development of data and computational intensive e-Science on a large scale distributed infrastructure.
  • Application scientist canfocus on his own sciencerather than ICT problems

Application

ICT

Overhead

slide16

Application feedback

Application

specific

service

Medical

Application

Telescience

Bio ASP

Application

Potential

Generic service

&

Virtual

Lab. services

Virtual Lab.

rapid prototyping

(interactive simulation)

Virtual Laboratory

Additional

Grid Services

(OGSA services)

Grid Middleware

Grid

&

Network

Services

Surfnet

Network Service

(lambda networking)

VL-E Experimental Environment

VL-E Proof of concept Environment

Stable Application

& VL-e component

Unstable Application

& VL-e component

Vl-E certification Environment

A set of tests that have to be passed

before any application software or VL-e component can be deployed on the VL-e proof of concept environment

mission
Mission

Effectively reuse existing workflow managements systems, and provide a generic e-Science framework for different application domains.

A generic framework can

  • Improve the reuse of workflow components and workflows in different experiments
  • Reduce the learning cost needed for learning different systems
  • Allow users to work on a consistent environment when underlying infrastructure changed
two phase approach
Two phase approach
  • Recommend suitable workflow systems for different application domains:
    • Analyze typical application use cases
    • Define small projects with different application domains
    • Review existing workflow systems
    • Recommend four workflow systems: Triana, Taverna, Kepler, and VLAMG
  • A long term
    • Extend VLAMG and develop our own generic workflow framework

Recommendation report: scientific workflow management in PoC R1 VL-e internal report, Oct 17, 2005.

lessons learned from phase 1
Lessons learned from phase 1
  • In the scientific community there are two types of workflow users: the end-users, the application developers.
  • The two categories of users have completely different requirements: easy-to-use, easy-for-developing new applications, and easy-for-migrating legacy applications
  • How to introduce a new WMS to a domain scientist?
    • Because it has a well defined architecture?
    • Or because it can allow him to keep their current work style?
  • How to reuse existing work?
    • Support multiple WMS systems or add more options to one WMS?
  • How to efficiently include user in the computing loop?

Z. Zhao et al., “Scientific workflow management: between generality and applicability”, QSIC 2005, Australia

slide20

Distributed

data sharing &

dissemination

Distributed

resources

Distributed

Parallel

computing

Visualization,

Remote resource

invocation

Computer support for problem solving

  • Problem Solving Environment: (E Gallopoulos et. al., IEEE CS Eng. 1994)
    • Organize different software components/ tools
    • Allows a user to assemble these tools at a high level of abstraction
    • Control runtime behavior of experiments
    • Examples: MATLab, Ptolemy, etc.

Scientific Workflow Management:

organize and execute on grid enabled resources!

Traditional PSE:

organize and execute resources locally!

diversity in swms
Diversity in SWMS
  • Taverna:
  • Web services based language: Scufl;
  • FreeFluo: engine
  • Graphical viz of workflow
  • Triana:
  • Components
  • Task graph
  • Data/control flow
  • Kepler:
  • Actor,director
  • MoML
  • Execution models
  • Pegasus:
  • Based on DAGMan
  • VDL
  • DAG

  • DAGMan:
  • Computing tasks
  • DAG
a workflow bus paradigm
A workflow bus paradigm

Workflow bus

Z. Zhao et al., “Workflow bus for e-Science”, to appear IEEE e-Science 2006, Amsterdam

ws vlam engine architecture
ws-VLAM Engine: architecture

Service host(s) and compute element(s)

GT4 Java Container

Job functions

GRAM

services

ws-RTSM

Factory

pre-ws-GRAM

Client

ws-RTSM

Instance

Worker

nodes

Delegate

Delegation

service

Workflow

components

GRAM

Ws-RTSM

Instance

Client

Delegation

Service

ws-RTSM

Factory

on going work
On going work
  • Objective:
    • Invoke ws-VLAM RTSM GT4 service from kepler/Taverna environment to execute a predefined Application workflow.
  • ws-VLAM Application workflow:
    • Scientific experiments composed of software components that need to be executed on Grid-enabled resources (CPU intensive)
    • Potential VLAM Application workflow can be described as:
      • a Pipeline of processes exchanging streams of data.
execute the ws vlam workflow in kepler taverna
Execute the ws-VLAM workflow in Kepler/Taverna
  • A predefined Application workflow developed in VLAM can be executed as a single step in Kepler/Taverna
    • (no need to recompose graphically the whole workflow).
  • The predefined Application workflow will be executed on any remote computing resource where the VLAM-RTSM GT4 Web service is installed.
  • Advantages:
    • Compose workflow where sub-workflows (which require grid resources) are executed on grid-enabled resources, while the rest of the workflow is either executed using other Kepler actors or taverna processors
    • It is also more efficient, since it avoid the overhead which will result by wrapping every workflow component as a separated web service or a separate remote grid-execution.
execute the ws vlam workflow in kepler taverna1
Execute the ws-VLAM workflow in Kepler/Taverna

Kepler/Taverna workbench

RTSM-GT4 Web service

(Available on DAS2 )

Das2 or PoC facilities.

GT4 Java Container

GRAM

services

(2) Service

Invocation

ws-RTSM

Factory

pre-ws-GRAM

VLAM Actor

or

Taverna processor

(To be developed)

RTSM Client

ws-RTSM

Instance

Worker

nodes

Workfow

Description

(XML)

(1) Proxy

Delegate

Delegation

service

Workflow

components

  • Kepler/Taverna users can have access to some of the parameters of the Application workflow to change the default values
  • Kepler/Taverna users have to specify the location of the input data file as URL and will get back a URL if the workflow generates data files
  • Graphical output of the Application workfloware handled automatically by the VLAM Taverna processor /Kepler actor.
research scope and lines
Research scope and lines
  • Focus 1: Interoperability and integration between workflow systems
  • Focus 2: Composition of meta workflows
  • Focus 3: Provenance at meta workflows
  • Focus 4: Enactment and orchestration of meta workflows
  • Focus 5: Human in the loop computing in meta workflows

Z. Zhao, A. Belloum, M. Bubark: A research plan of VL-e SP2.5 V0.2 September 9, 1006