An open provenance model for scientific workflows
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

An Open Provenance Model for Scientific Workflows PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on
  • Presentation posted in: General

An Open Provenance Model for Scientific Workflows . Professor Luc Moreau [email protected] University of Southampton www.ecs.soton.ac.uk/~lavm. Provenance & PASOA Teams. University of Southampton

Download Presentation

An Open Provenance Model for Scientific Workflows

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


An open provenance model for scientific workflows

An Open Provenance Model for Scientific Workflows

Professor Luc Moreau

[email protected]

University of Southampton

www.ecs.soton.ac.uk/~lavm


Provenance pasoa teams

Provenance & PASOA Teams

  • University of Southampton

    • Luc Moreau, Paul Groth, Simon Miles, Victor Tan, Miguel Branco, Sofia Tsasakou, Sheng Jiang, Steve Munroe, Zheng Chen

  • IBM UK (EU Project Coordinator)

    • John Ibbotson, Neil Hardman, Alexis Biller

  • University of Wales, Cardiff

    • Omer Rana, Arnaud Contes, Vikas Deora, Ian Wootten, Shrija Rajbhandari

  • Universitad Politecnica de Catalunya (UPC)

    • Steven Willmott, Javier Vazquez

  • SZTAKI

    • Laszlo Varga, Arpad Andics,

      Tamas Kifor

  • German Aerospace

    • Andreas Schreiber, Guy Kloss,

      Frank Danneman


Contents

Contents

  • Motivation

  • Provenance Concept Map

  • Process documentation in a concrete bioinformatics application

  • Conclusions


Motivation

Motivation


Peer review audit

Peer Review/Audit

Academic

publishing

Accounting

Healthcare

Banking


E science datasets

e-Science datasets

  • How to undertake peer-reviewing and validation of e-Scientific results?


Current solutions

Current Solutions

  • Proprietary, Monolithic

  • Silos, Closed

  • Do not inter-operate with other applications

  • Not adaptable to new regulations


Provenance

Provenance

  • Oxford English Dictionary:

    • the fact of coming from some particular source or quarter; origin, derivation

    • the historyor pedigree of a work of art, manuscript, rare book, etc.;

    • concretely, a record of the passage

      of an item through its various

      owners.

  • Concept vs representation


Application drivers

Application Drivers

Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients

Aerospace engineering: maintain a historical record of design processes, up to 99 years.

Bioinformatics: verification and

auditing of “experiments” (e.g.

for drug approval)

High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN)


Provenance concept map

Provenance Concept Map


An open provenance model for scientific workflows

documents

Process

is defined as a past

Process

Documentation

has a structure

Provenance

(

concept

)

Provenance

Query

produces

is an execution of

is represented by

has

Provenance

operates over

P

-

structure

is obtained by

(

representation

)

contains

Application

Data product

P

-

assertions

assert

consists of

Services


Making applications provenance aware

Application

Data Product

Assert p-assertions and

record them as Process Documentation

Making Applications Provenance Aware

Provenance

Store

Obtain the provenance

of data by issuing

provenance queries


Process documentation

f1

f2

Process Documentation

I received M1, M4

I sent M2, M3

Interaction

p-assertions

M1

M3

M4

Service state

p-assertions

M2

Relationship

p-assertions

M3 = f1(M1)

M2 = f2(M1,M4)

M2 is in reply to M1

I received M1 at time t

I used algorithm x.y.z


Data flow

Data flow

  • Interaction p-assertions allow us to specify a flow of data between services

  • Relationship p-assertions allow us to characterise the flow of data “inside” an service

  • Overall data flow (internal + external) constitutes a DAG, which characterises the process that led to a result


Process documentation in a concrete bioinformatics application

Process Documentation in a Concrete Bioinformatics Application


Biology

Biology

  • Determine how protein sequences fold into a 3D structure?

  • Structure of protein sequences may help to answer this question.

  • Structure can be quantified by textual compressibility.

  • Determine the amino acid groupings that maximize compressibility?


Collaboration diagram

Collaboration Diagram


Actual call dag

Actual Call DAG


The p structure

The P-Structure

The logical structure of a provenance store


Interaction record

Interaction Record

The set of p-assertions pertaining to a

given interaction (i.e., message

exchange between a sender and a

receiver)


Interaction key

Interaction Key

A unique identifier for an interaction

Sender identity

Receiver identity

Local id


An open provenance model for scientific workflows

View

The set of p-assertions created by an asserter

involved in an interaction (sender or receiver

view)


Asserter

Asserter

The identity of an asserter


Interaction p assertion

Interaction P-Assertion

An assertion of the contents of a message by

an actor that has sent or received that message


Interaction p assertion content

Interaction P-Assertion Content

The content of an interaction p-assertion:

here, the invocation of blast (through a

wrapper)


Interaction content

Interaction Content

Provenance-related information passed in

application messages


Actor state p assertion

Actor State P-Assertion

An assertion made by an actor about its internal

state in the context of a specific interaction


Relationship p assertion

Relationship P-Assertion

With respect to an interaction, a relationship p-assertion is an

assertion, made by an actor, that describes how the actor obtained

output data or the whole message sent in that interaction by applying

some function to input data or messages from other interactions.


Subject id

Subject Id

The identity of the subject of a relationship


Object id

Object Id

The identity of the object of a relationship


Process documentation characteristics

Process Documentation Characteristics

  • Common logical structure of the provenance store shared by all asserting and querying actors

  • Can be produced autonomously, asynchronously by the different application components

  • Open, extensible model, for which we are producing a public specification

  • Tools can operate on it (e.g. visualisation, reasoning)


Performance hpdc 05

Performance (HPDC’05)


Standardisation philosophy

Standardisation Philosophy

  • Thin layer common between systems: extensible data model

  • Model can be extended for specific:

    • technologies (WS, Web, …), or

    • application domains (Bio, Healthcare, Desktop, …)

  • Service interfaces


Proposed list of specifications

Proposed List of Specifications

GenericProfiles

Domain

Specific

Profiles

WS-Prov-DM-Sec

WS-Prov-Intro

WS-Prov-DM-Link

WS-Prov-Glo

WS-Prov-DM-Infer

WS-Prov-DM

WS-Prov-DM-DS

WS-Prov-Primer

WS-Prov-DM-Rel

WS-Prov-Rec

WS-Prov-Query

Technology Bindings

WS-Prov-SOAP

WS-Prov-WWW


Conclusions

Conclusions


To sum up

Apply

Record

  • Provenance

    • Architecture

    • Methodology

Provenance

Store

To Sum Up

Finance

Distribution

Aerospace

Standardising the

documentation of

Business Processes

Healthcare

Automobile

Pharmaceutical

  • Compliance check

  • Rerun/Reproduce

  • Analyse

Query

Slide from John Ibbotson


Conclusions1

Conclusions

  • Crucial topic for many applications

  • Full architectural specification

  • Implementation available for download

  • Methodology to make application provenance-aware

  • Draft standardisation proposal to be released

  • www.pasoa.org

  • www.gridprovenance.org


Provenance challenge

Provenance Challenge

Provenance Challenge Workshop

at OGF18, Washington,

September 11-14

twiki.ipaw.info


Questions

Questions


  • Login