1 / 17

An Introduction to Scientific Workflows

Dr. Shiyong Lu Shiyong@wayne.edu Department of Computer Science Wayne State University. An Introduction to Scientific Workflows. The scientific workflow paradigm. Workflows are used to automate various data analysis tasks, which might produce further data;

hosea
Download Presentation

An Introduction to Scientific Workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dr. Shiyong Lu Shiyong@wayne.edu Department of Computer Science Wayne State University An Introduction to Scientific Workflows

  2. The scientific workflow paradigm • Workflows are used to automate various data analysis tasks, which might produce further data; • Provenance is captured automatically to record the history of workflow evolution and data derivation so that data and workflows can be reproduced when necessary.

  3. Growing areas of scientific workflow applications Neuroinformatics Bioinformatics Oceanography Astronomy

  4. What is a scientific workflow? • A scientific workflow is a formal specification of a scientific process, which represents, streamlines, and automates the steps from dataset selection and integration, computation and analysis, to final data product presentation and visualization. • An artifact on its own for scientists to patent, reuse, to publish, and to share (myexperiment.org). • Who-discovers-first might become who-comes-up-with-the-right-scientific-workflow-first!

  5. A scientific workflow example (C. Lin, et al. 2008)

  6. A more complex scientific workflow example (Alhayafi et al., 2008)

  7. What is a Scientific Workflow Management System? • A scientific workflow management system (SWFMS) is a system that supports the specification, modification, run, re-run, and monitoring of scientific workflows. • Supports a high-level workflow specification language (e.g., WSL/TSL, XSCUFL) • An end user describes their workflows using that language, typically with a graphical workflow designer. • SWFMS interprets a workflow of that language by a coordinated execution of the tasks in the workflow.

  8. Major components of a SWFMS (Lin et al., 2008)

  9. Workflow design can be performed with the assistance of any workflow design tool, typically with a graphical user interface for the ease of manipulation by scientists; • The resulting scientific workflow is usually represented in a scientific workflow specification language (e.g., SWL, XSCUFL, MOML). • A standard scientific workflow language has yet to appear, one major reason for poor interoperability among SWFMSs today. Workflow design

  10. Workflow enactment • The workflow engine performs workflow enactment - creates a workflow case and schedules task invocation in an order according to the workflow logic. • The movement of controlflows and dataflows; • Workflow status management; • Provenance collection: workflow evolution and data derivation history.

  11. Task management • It is responsible for the resource provisioning, scheduling, and monitoring of task execution; • Abstractions of various local and remote heterogeneous services and software tools as workflow tasks; • Abstraction of a subworkflow as a composite task (with internal implementation hidden); • Registration, annotation, and searching of tasks.

  12. Data product management • Responsible for the management of large amount of source, intermediate, and final data products of the execution of scientific workflows. • Registration, annotation, searching, replicating of data products for reuse, publishing, sharing, presentation, and visualization. • Representation and location transparency: abstract various heterogeneous and distributed data sets as data products. • Petascale data product system is needed for large-scale data-intensive scientific workflows.

  13. Provenance management • Scientific workflow provenance is one kind of metadata that captures the derivation history of a data product, including the original data sources, intermediate data products, and the workflow tasks that were applied to produce a data product. • Capturing provenance is critical for scientific workflows to support reproducibility, result interpretation, and problem diagnosis. • Provenance management concerns about the efficiency and effectiveness of recording, representing, storing, querying, and visualizing provenance.

  14. Workflow monitoring • The monitoring of the progress and status of the execution of scientific workflows is very important, particularly for long-running scientific workflows. • Scientific workflows can be dynamically changed by end-users and can orchestrate heterogeneous services over unreliable networks, many exceptions and failures might occur. • The complexity and scale of data analysis and computation in scientific workflows impose additional challenges on workflow monitoring and failure handling.

  15. Scientific workflows vs. business workflows: a user’s perspective • Dataflow-oriented vs. controlflow-oriented. • Reproducible vs. non-reproducible. • Data-centric vs. business-centric => data parallelism vs. concurrency control (ACID)? • Scalable vs. correct. Scientific workflows uses a trial-and-error approach and an error is also “a success” on its own, and scalability is the concern; business workflows cannot bear errors, which often result in economical loss. • Explicit vs. implicit. • Mutable vs. immutable. • Static vs. dynamic binding to resources.

  16. Scientific workflows vs. business workflows: an architectural perspective Business Workflows Scientific Workflows (Hollingsworth, 1995) (Lin, et al., 2008)

  17. Scientific workflows vs. business workflows: a workflow language perspective • Visual programming-in-the large or not? • Datalfow programming model vs. imperative programming model • Dataflow constructs vs. controlflow constructs • Hierarchical workflow composition • Single assignment property? • Physical and logical data models • Task and workflow level exception handling

More Related