1 / 24

RWS Provenance Experiments in Kepler (Kepler + PR + RWS)

RWS Provenance Experiments in Kepler (Kepler + PR + RWS). Norbert Podhorszki Ilkay Altintas Bertram Ludaescher in collaboration with Shawn Bowers Timothy McPhillips. Initial Provenance Framework (IPAW’06, Altintas et al.). Vision: Modeled as a separate concern in the system

olinda
Download Presentation

RWS Provenance Experiments in Kepler (Kepler + PR + RWS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RWS Provenance Experiments in Kepler (Kepler + PR + RWS) Norbert Podhorszki Ilkay Altintas Bertram Ludaescher in collaboration with Shawn Bowers Timothy McPhillips

  2. Initial Provenance Framework (IPAW’06, Altintas et al.) • Vision: • Modeled as a separate concern in the system • Optional drag and drop feature • Listen to execution and save information (customizable): • Context: who, what, where, when, and why that is associated with the run • Input data and its associated metadata • Workflow outputs and intermediate data products • Workflow definition (entities, parameters, connections): a specification of what exists in the workflow and can have a context of its own • Information about the workflow evolution -- workflow trail

  3. Kepler System Architecture Authentication GUI …Kepler GUI Extensions… Vergil Documentation Provenance Recorder Smart Re-run / Failure Recovery SMS Kepler Object Manager Type System Ext Actor&Data SEARCH Kepler Core Extensions Ptolemy IPAW’06-Altintas et al.

  4. Parametric and customizable Different report formats Variable levels of verbosity all, some, medium, on error Multiple cache destinations Saves information on User name, Date, Run, etc… Kepler Provenance Recorder (IPAW’06, Altintas et al)

  5. [s!] ??? Read-Write-ReSet Model (IPAW’06, McPhillips et al) r … r w…w • r, r …. r, w, w, … w, r, … r, w, ... w, … firing • what about actor state? what about “real” dependencies? • reset event s defines when actor “cuts off” dependencies • a semantic notion, known to the actor [developer] (or part of a higher-order scheme) • r, r …. r, w, w, … w, [s!] r, … r, w, ... w, … A3 PS

  6. Goals of the PR+RWS Experiments • Use the RWS model for Kepler workflows • both single-level and nested workflows (fun starts here :-) • Extend the Kepler Provenance Recorder • Modify the methods of the provenance listener class • Classes to store execution data about the workflow • To generate the send-receive relations of the tokens correctly • To count actor firings correctly • Disclaimer: Initially only one workflow run is targeted • (but approach can handle multiple actor firings due to pipeline parallelism .. ) • future: queries over several runs and workflow-provenance • (others in Kepler already doing this  merge efforts in the future)

  7. Implementation: Data Model • Port-actor relationship • portTable(Port, Actor, type) • type is r as real and v as virtual (transparent) • Token-object relationship • tokenTable(Token, Object) • Object-value relationship • objectTable(Object, Value, Type) • type is currently not recorded • RWS trace • traceTable(Port, Event, Token, FiringCounter) • event: r as read, w as write or s as state-reset

  8. Implementation: Class Hierarchy • Extends the existing provenance execution listener with • Methods • More event listeners • Supporting classes • RWSPortInfo, RWSActorInfo • Data structures for building and containing info about the workflow (and counters for event record • RWSEvent • Handles RWS events

  9. Execution: Initialization Initialization phase RWSPortInfo (info locally known at a port) for each port Generate RWS portMap initialize() RWSPortInfo (build connection info) for each port Generate RWS actorMap for each actor Create new RWS event list RWSActorInfo portTable Record static wf info

  10. Execution: Event Handling and Modifications Just before run Subscribe to token listeners TokenSend TokenGet validate() Before model is executed. event handling methods are extended here When the workflow is modified changeExecuted() Sth is changed in the workflow Re-generate RWS portMap

  11. Execution: During the workflow run When a token event occurs New RWS event w TokenSendEvent() tokenTable Print sent token’s info (token id, object id, value) For each connected transparent port objectTable Generate virtual TokenGet event traceTable New RWS event r TokenGetEvent() Generate virtual TokenSend event If it is a transparent port

  12. A Kepler Workflow Implementation RWS TRACE Table # of elements size in KB portTable 81 4 tokenTable 30 2objectTable 30 3traceTable 86 6

  13. Query 1.a Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. Answer a. list of actors that contributed to the result: (21 actors). They appear in reversed order as they were executed. ?- q1b_actors('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ActorList), print(ActorList). [ .pc.Convert_x, .pc.Slicer_x, .pc.SoftMean, .pc.Reslice3, .pc.Reslice2, .pc.Reslice4, .pc.Reslice1, .pc.AlignWarp3, .pc.RefImg, .pc.RefHdr, .pc.InputHdr3, .pc.InputImg3, .pc.AlignWarp2, .pc.InputHdr2, .pc.InputImg2, .pc.AlignWarp4, .pc.InputHdr4, .pc.InputImg4, .pc.AlignWarp1, .pc.InputImg1, .pc.InputHdr1 ]

  14. Query 1.b Answer b. list of intermediate values created by the workflow (26 values). ?- q1b_values('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ValueList), print(ValueList). ["/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage4/atlas-x.pgm", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp3.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.img", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy3.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp2.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy2.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp4.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy4.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy4.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp1.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.hdr” ]

  15. Improved PC workflow (cf. COMAD wf) • A more generic workflow to accepts any number of images • Smaller number of actors • This effects the number of values as it requires additional array operations • cf. also COMAD approach and Taverna approach (but we fire AlignWrap individually here) RWS TRACE Table # of elements size in KB portTable 42 2 tokenTable 51 3 objectTable 39 4 traceTable 150 9

  16. Improved PC workflow

  17. Query 1 Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. Answer a. list of actors that contributed to the result: (15 actors). They appear in reversed order as they were executed. ?- q1b_actors('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ActorList), print(ActorList). [ .pca.Convert, .pca.Slicer , .pca.hdrrepeat, .pca.seqXYZ, .pca.imgrepeat, .pca.SoftMeanArray, .pca.imgarray, .pca.hdrarray, .pca.Reslice, .pca.AlignWarp, .pca.RefHdr, .pca.InputHdr, .pca.InputImg, .pca.RefImg, .pca.Ramp ]

  18. Query 1 Answer b. list of intermediate values created by the workflow (33 values). It includes internal data values (arrays) additionally to the original file names. ?- q1b_values('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ValueList), print(ValueList). [ "/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage4/atlas-x.pgm", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.hdr", "x", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.img", { "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.img" }, { "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.hdr" }, "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp1.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.img", 1, etc...

  19. Nested workflow tricky example S

  20. The trick • Multi-port of Ptolemy • two distinct channels going into S and out from S • A’s output is delivered to S.C • B’s output is delivered to S.D • S.C’s output is delivered to E • S.D’s output is delivered to F

  21. Lineage of actors and values Who contributed to value C.1 arrived at E? ?- q1('"C.1"', ActorList, ValueList). ActorList = ['.WF15.S.C', '.WF15.S', '.WF15.A'] ValueList = ['"C.1"', '1', '1'] Who contributed to value D.2 arrived at F? ?- q1('"D.2"', ActorList, ValueList). ActorList = ['.WF15.S.D', '.WF15.S', '.WF15.B'] ValueList = ['"D.2"', '2', '2']

  22. Single-level lineage of actors and values Who contributed to value C.1 arrived at E? ?- q1b('"C.1"', ActorList, ValueList). ActorList = ['.WF15.S', '.WF15.A'] ValueList = ['"C.1"', '1'] Who contributed to value D.2 arrived at F? ?- q1b('"D.2"', ActorList, ValueList). ActorList = ['.WF15.S', '.WF15.B'] ValueList = ['"D.2"', '2']

  23. Conclusions • 1st attempt combining Kepler PR & Kepler RWS provenance model • Both published in IPAW 2006 • Query 1 was successfully answered. • Queries 2 and 3 are answerable, but hadn’t been implemented yet. • Queries on multiple runs and workflow design provenance is out of the scope of this initial prototype. • Other groups in Kepler focusing on this.

  24. Some related references • Provenance Framework/Recorder: • Provenance Collection Support in the Kepler Scientific Workflow System,I.Altintas, O. Barney, E. Jaeger-Frank, IPAW2006, Chicago, Illinois, May 2006. • RWS Model: • A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows, Shawn Bowers, Timothy McPhillips, Bertram Ludaescher, Shirley Cohen, Susan B. Davidson. International Provenance and Annotation Workshop (IPAW'06), Chicago, Illinois, USA, May 3-5, 2006.

More Related