The design and implementation of a workflow analysis tool
Download
1 / 16

The design and implementation of a workflow analysis tool - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

The design and implementation of a workflow analysis tool. Vasa Curcin Department of Computing Imperial College London. Scientific workflow field. Scientific workflows : a high-level programming language with explicit graphical representation of flow of data and/or control

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The design and implementation of a workflow analysis tool' - chinue


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The design and implementation of a workflow analysis tool

The design and implementation of a workflow analysis tool

Vasa Curcin

Department of Computing

Imperial College London


Scientific workflow field
Scientific workflow field

Scientific workflows: a high-level programming language with explicit graphical representation of flow of data and/or control

Research into automation of processes supporting scientific research

Significant role in providing middleware for UK eScience programme: Taverna, Discovery Net, Triana

Lingua franca of service-oriented computing


Deluge of workflows
Deluge of workflows

Pegasus

Meandre

Taverna

Discovery Net

Triana

Kepler

Orange

Pentaho

KNIME

Trident

YAWL

LONI

GenePatterns

Galaxy

VisTrails

UGENE

Wildfire

BPEL

Cheminformatics

Environmental Science

Astronomy

Sensor informatics

Business Intelligence

Bioinformatics


Workflow analysis
Workflow analysis

  • There is a need for formal models to capitalize on the benefits of this infrastructure

    • Work evaluated on Discovery Net workflow

    • Concepts applicable to other workflow systems

  • Some aims

    • Minimise cost of data movement and processing

    • Provide technology for workflow clients and warehouses (indexing, guided construction…)

  • Tasks

    • Safeness

    • Instance bounds

    • Static workflow optimization

    • Establishing polymorphic type profiles of workflows


Underlying models
Underlying models

  • Control flow model

    • Process calculus definitions

    • Communication along named channels

      • Fixed for atomic execution, dynamic for streaming

    • New instance of the process launched as soon as the node receives a token

    • Computational tree logic modelling execution states

  • Data flow model

    • Nodes associated with lambda calculus formulas and term graphs

    • Polymorphic type transformations

    • Rewrite rules defined for sets of nodes as term graph transformations

  • Embedding

    • Way of combining the control and data semantics


Workflow analysis tool
Workflow analysis tool

  • Similarity checker

    • Bisimilarity of processes

  • Process profiler

    • Deadlock/livelock detection

    • Reachability

    • Task bounds

  • Composability checker

    • Design-time tests

    • Type requirements

    • Polymorphic properties

  • Equivalence checker

    • Functional equivalence

  • Optimizer

    • Rewrite rules for transformations


Similarity checker
Similarity checker

Model checker

Process model

Workflow

  • Based purely on the pi-calculus process model

    • Workflows translated into the process model

    • Parallel composition of independent node processes with named channels

    • Compared in terms of:

      • Internal executions (node actions)

      • Set of observable outputs - define only relevant outputs

  • Model checker used to test different types of bisimilarity

    • Node executions conveniently represented as silent actions

    • Strong bisimulation becomes strict one-to-one workflow action mapping

    • Weak bisimulation ignores internal actions and communications and focuses on visible outputs


Similarity checker example
Similarity checker: example

  • ABC (Another Bisimilarity Checker) used

  • Model checker used to test different types of bisimilarity

    • Node executions conveniently represented as silent actions

    • Strong bisimulation becomes strict one-to-one workflow action mapping

    • Weak bisimulation ignores internal actions and communications and focuses on visible outputs


Process profiling
Process profiling

Kripke frame

Process model

Workflow

  • The process algebra representation translated into a Kripke frame

    • Enumerated states denoting the number of instances of each workflow node

    • Transitions of the frame are the node executions

    • Use CTL formulas to query

    • NuSMV model checker employed

  • Allows questions such as:

    • Reachability of a particular state

    • Detection of deadlocks and livelocks

    • Safety - some state always executing

    • Bounds on a number of instances of a node


Process profiling example
Process profiling: example

  • Reachability

    • EF Fτ1– Is there an execution that achieves one instance of F

    • AF Fτ1– Do all executions always achieve one instance of F

  • Livelocks

    • AG (Cτ-> AG AF Cτ) – Is there always a livelock with C

    • EF (Cτ-> AG AF Cτ) – Can there be a livelock with C

  • Instance bounds

    • maxX .EFAτx– What is the maximum number of instances of A


Composability checker
Composability checker

Type formulas

Data model

Workflow

  • Polymorphic type formulas for the workflow components/fragments

  • When composing:

    • The output and input of each fragment compared in terms of free and bound type variables

    • If no clashes, free variables resolved to form the type formula of the composition

    • Inference engine developed specifically for the tool

  • Determines:

    • If a workflow fragment can be reused on a new input

    • Find compatible services in the warehouse


Composability checker example
Composability checker: example

  • Fragment of three nodes LMN

    • Input q, with required attributes A, B, D

    • Two outputs u, v

    • A present in both. B in u. D in neither.

  • Two outputs can be joined with O


Equivalence tester optimizer
Equivalence tester / optimizer

Node equivalences

Data model

Workflow

  • Uses a set of node equivalence rules

    • Definedfor each workflow system or node subset

    • Algorithm applies allowed transformations to reduce two workflows to the same expression

  • Combined with rewrite heuristics

    • Node-specific again

    • Simple example: relational model again


Equivalence tester optimizer example
Equivalence tester/optimizer: example

  • Relational workflow searching for Adverse Drug Reactions in GPRD database

  • Rewrite rules

    • Set of relational equivalences

  • Heuristics

    • Early projections/selections

    • Late joins

    • Easy scenario – brute force algorithm works


Related and future work
Related and future work

  • Data typing

    • COMAD for Kepler

  • Workflow process analysis

    • GWorkflowDL

    • YAWL

  • New workflow tools with relational structures

    • KNIME

    • Orange

    • Pentaho

  • Extensions:

    • Streaming – blocking and batching

    • Improved state reduction algorithms for CTL model

    • Adding more type constructs for polymorphism


Summary
Summary

  • Workflow analysis needed to improve takeup and exploitation of workflows

    • Enterprise environments

    • Profile resource usage, risk of failure, execution time

    • Support reuse and repurposing

  • Separation of control and data aspects allows use of existing model checkers and familiar techniques

    • Process algebras, temporal logics, type polymorphisms, term graphs

  • Current version works on Discovery Net/InforSense

    • KNIME, Pentaho very similar – only require extra parsers

    • Full streaming process model for Taverna in the works


ad