1 / 28

Workflow design and implementation issues in the VL-e project

Workflow design and implementation issues in the VL-e project. P.Adriaans A Belloum. Outline. Background The Workflow design problem Virtual Laboratory for e-Science Our approach Challenges and research lines Activities. Workflow Design: The problem. Solution 1: Incremental clustering.

maik
Download Presentation

Workflow design and implementation issues in the VL-e project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workflow design and implementation issues in the VL-e project P.Adriaans A Belloum

  2. Outline • Background • The Workflow design problem • Virtual Laboratory for e-Science • Our approach • Challenges and research lines • Activities

  3. Workflow Design: The problem

  4. Solution 1: Incremental clustering

  5. Solution 2: Feature analysis

  6. 1700 Comparisons 3500 Comparisons

  7. The Workflow design problem • A workflow is an inherent part of the problem solving heuristics • Induction of optimal workflows is an important research issue • Manipulating workflows is an important aspect of E-science

  8. The KDD process • Cleaning • Domain consistency • De-duplication • Disambiguation Data selection Enrichment Coding Reporting & application • Data Mining • Clustering • Segmentation • Prediction Information requirements Action external data Feedback

  9. Adaptive Information Disclosure Formulate query Fire query Search Construct answer Display results User support: Alternatives Disambiguation Query Expansion Filtering Relevance- score Link to Concept tree Data Selection Preprocessing Named Entity Recognition Relation Recognition • Advanced • Constraint • Recognition Validation Version Manage- Ment Ontology Domain selection Ontology Learning Information Retrieval

  10. Application IT Overhead IT Overhead IT Overhead • Traditional position of ICT in science: • Application running on a single machine… • Little ICT overhead, no collaboration and/or • sharing of data and information • Evolving technological developments like WEB & Grid • and Service Oriented Architecture allow sharing of • data and information, thus enabling scientific • applications to do experiments that had not been • possible before… • Larger ICT overhead • e-Science is based on WEB &Grid and other application • supporting ICT… • Infrastructure will be helpful !! Application ICT Overhead

  11. Application IT Overhead IT Overhead • Typical e-science applications require more than just one single resource, as well as sharing of resources • Moreover: • often resources (computing, storage, networks) are geographically distributed across different security domains building such a system: • introduces a large ICT overhead • requires extensive ICT Knowledge • Application scientist forced tofocus on ICT problemsrather than science • Recent developments in WEB&Grid based e-Science frameworks like VL-e are providing basic services which will help hiding computing resources to boost the development of data and computational intensive e-Science on a large scale distributed infrastructure. • Application scientist canfocus on his own sciencerather than ICT problems Application ICT Overhead

  12. Application feedback Application specific service Medical Application Telescience Bio ASP Application Potential Generic service & Virtual Lab. services Virtual Lab. rapid prototyping (interactive simulation) Virtual Laboratory Additional Grid Services (OGSA services) Grid Middleware Grid & Network Services Surfnet Network Service (lambda networking) VL-E Experimental Environment VL-E Proof of concept Environment Stable Application & VL-e component Unstable Application & VL-e component Vl-E certification Environment A set of tests that have to be passed before any application software or VL-e component can be deployed on the VL-e proof of concept environment

  13. Mission Effectively reuse existing workflow managements systems, and provide a generic e-Science framework for different application domains. A generic framework can • Improve the reuse of workflow components and workflows in different experiments • Reduce the learning cost needed for learning different systems • Allow users to work on a consistent environment when underlying infrastructure changed

  14. Two phase approach • Recommend suitable workflow systems for different application domains: • Analyze typical application use cases • Define small projects with different application domains • Review existing workflow systems • Recommend four workflow systems: Triana, Taverna, Kepler, and VLAMG • A long term • Extend VLAMG and develop our own generic workflow framework Recommendation report: scientific workflow management in PoC R1 VL-e internal report, Oct 17, 2005.

  15. Lessons learned from phase 1 • In the scientific community there are two types of workflow users: the end-users, the application developers. • The two categories of users have completely different requirements: easy-to-use, easy-for-developing new applications, and easy-for-migrating legacy applications • How to introduce a new WMS to a domain scientist? • Because it has a well defined architecture? • Or because it can allow him to keep their current work style? • How to reuse existing work? • Support multiple WMS systems or add more options to one WMS? • How to efficiently include user in the computing loop? Z. Zhao et al., “Scientific workflow management: between generality and applicability”, QSIC 2005, Australia

  16. Distributed data sharing & dissemination Distributed resources Distributed Parallel computing Visualization, Remote resource invocation Computer support for problem solving • Problem Solving Environment: (E Gallopoulos et. al., IEEE CS Eng. 1994) • Organize different software components/ tools • Allows a user to assemble these tools at a high level of abstraction • Control runtime behavior of experiments • Examples: MATLab, Ptolemy, etc. Scientific Workflow Management: organize and execute on grid enabled resources! Traditional PSE: organize and execute resources locally!

  17. Diversity in SWMS • Taverna: • Web services based language: Scufl; • FreeFluo: engine • Graphical viz of workflow • Triana: • Components • Task graph • Data/control flow • Kepler: • Actor,director • MoML • Execution models • Pegasus: • Based on DAGMan • VDL • DAG … • DAGMan: • Computing tasks • DAG

  18. A workflow bus paradigm Workflow bus Z. Zhao et al., “Workflow bus for e-Science”, to appear IEEE e-Science 2006, Amsterdam

  19. ws-VLAM Engine: architecture Service host(s) and compute element(s) GT4 Java Container Job functions GRAM services ws-RTSM Factory pre-ws-GRAM Client ws-RTSM Instance Worker nodes Delegate Delegation service Workflow components GRAM Ws-RTSM Instance Client Delegation Service ws-RTSM Factory

  20. On going work • Objective: • Invoke ws-VLAM RTSM GT4 service from kepler/Taverna environment to execute a predefined Application workflow. • ws-VLAM Application workflow: • Scientific experiments composed of software components that need to be executed on Grid-enabled resources (CPU intensive) • Potential VLAM Application workflow can be described as: • a Pipeline of processes exchanging streams of data.

  21. Execute the ws-VLAM workflow in Kepler/Taverna • A predefined Application workflow developed in VLAM can be executed as a single step in Kepler/Taverna • (no need to recompose graphically the whole workflow). • The predefined Application workflow will be executed on any remote computing resource where the VLAM-RTSM GT4 Web service is installed. • Advantages: • Compose workflow where sub-workflows (which require grid resources) are executed on grid-enabled resources, while the rest of the workflow is either executed using other Kepler actors or taverna processors • It is also more efficient, since it avoid the overhead which will result by wrapping every workflow component as a separated web service or a separate remote grid-execution.

  22. Execute the ws-VLAM workflow in Kepler/Taverna Kepler/Taverna workbench RTSM-GT4 Web service (Available on DAS2 ) Das2 or PoC facilities. GT4 Java Container GRAM services (2) Service Invocation ws-RTSM Factory pre-ws-GRAM VLAM Actor or Taverna processor (To be developed) RTSM Client ws-RTSM Instance Worker nodes Workfow Description (XML) (1) Proxy Delegate Delegation service Workflow components • Kepler/Taverna users can have access to some of the parameters of the Application workflow to change the default values • Kepler/Taverna users have to specify the location of the input data file as URL and will get back a URL if the workflow generates data files • Graphical output of the Application workfloware handled automatically by the VLAM Taverna processor /Kepler actor.

  23. Research scope and lines • Focus 1: Interoperability and integration between workflow systems • Focus 2: Composition of meta workflows • Focus 3: Provenance at meta workflows • Focus 4: Enactment and orchestration of meta workflows • Focus 5: Human in the loop computing in meta workflows Z. Zhao, A. Belloum, M. Bubark: A research plan of VL-e SP2.5 V0.2 September 9, 1006

  24. http://www.vl-e.nl/

More Related