1 / 28

VisTrails

VisTrails. Second Provenance Challenge Tommy Ellkvist David Koop Juliana Freire. Joint work with: Erik Andersen, Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger, Cláudio Silva, and Huy T. Vo. Outline. VisTrails Introduction VisTrails Demo Provenance Model and API

Download Presentation

VisTrails

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VisTrails Second Provenance Challenge Tommy Ellkvist David Koop Juliana Freire Joint work with: Erik Andersen, Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger, Cláudio Silva, and Huy T. Vo

  2. Outline • VisTrails Introduction • VisTrails Demo • Provenance Model and API • Challenge Results • Issues and Future Work

  3. VisTrails • Comprehensive provenance infrastructure for computational tasks • Support for exploratory tasks such as visualization and data mining • Workflows are iteratively refined as users generate and test hypotheses • New change-based provenance model • Uniformly captures data and workflow provenance

  4. Change-based Provenance • Provenance is stored as a tree of actions add module add connection

  5. Provenance: Storing Actions • Each change writes new actions to the tree <action id=“27” prevId=“26” user=“dakoop” date=“2007-06-20”> <add what=“module” objectId=“12”> <module id=“12” name=“vtkProperty” cache=“1”> <location id=“17” x=“-7.0” y=“97.0”/> </module> </add> <add what=“connection” objectId=“13”> <connection id=“13”> <port type=“source” moduleId=“10”/> <port type=“destination” moduleId=“12”/> </connection> </add> </action>

  6. Change-based Provenance • Data provenance: where does a specific data product come from? • Workflow evolution: how has workflow structure changed over time? • Treat workflow versions as data–store provenance of workflows

  7. Layered Provenance

  8. Layered Provenance

  9. Layered Provenance

  10. Layered Provenance

  11. VisTrails Provenance • Normalized information–no redundancy! • Each layer provides more specific information but refers to parent layers • Workflow EvolutionWorkflowExecution • Extensible storage options • Support for both relational and XML • Flexible annotation framework–users can specify application-specific provenance information

  12. Provenance for Reproducibility and Beyond • Infrastructure for querying and reusing provenance • Query workflows by example • Create workflows by analogy • Collaborative exploration • Scalable derivation of data products

  13. VisTrails Demo

  14. Supporting Different Provenance Backends • VisTrails has powerful tools to query and reuse provenance information • There are many powerful workflow systems that produce such information • Problem: How to integrate different provenance backends? • Our approach: A mediation-based approach to provenance interoperability

  15. Mediator Architecture Mapping from global schema to data source specific schema

  16. Mediated Provenance Mapping from general model to engine-specific model

  17. Combining Provenance • Establish model • Produce an API for this model • Wrap provenance access for each system so that queries become native over their provenance data

  18. Provenance Model • Follows the layered architecture • Versions map to a workflows • Workflows are modeled as graphs • Parameters capture module state • User-defined annotations are available at each layer of the model • Module Definition stores information about the computational pieces

  19. Provenance Model

  20. Provenance API • Implements common access queries and operations over the provenance model • Examples: getParent(module) getChildren(module) getUpstream(module) getDownstream(module) getAnnotations(module | workflow | …) getDataItems(module_exec) getParameters(module) getVersion(time) getExecutedModules(workflow) getConnection(data_item) getPorts(connection) findModulesByParameter(search_params) findModulesByAnnotation(search_params) findExecutionsByAnnotation(search_params) findVersionsByModules(search_params)

  21. Provenance API Example getExecutedModules(wf_exec) VisTrails (XPath) def getExecutedModules(self, wf_exec): newdataitems = [] q = '//exec[@id="' + wf_exec.pid.key + '"]/@moduleId' dataitems = self.logcontext.xpathEval(q) Pasoa (XPath) def getExecutedModules(self, wf_exec): q = "//ps:relationshipPAssertion[ps:localPAssertionId='" + wf_exec.pid.key + "']/ps:relation" dataitems = self.context.xpathEval(q) Taverna (SPARQL) def getExecutedModules(self, wf_exec): " " q = ''' SELECT ?mi FROM <''' + self.path + '''> WHERE { <''' + wf_exec.pid.key + '''> <http://www.mygrid.org.uk/provenance#runsProcess> ?mi } ''' return self.processQueryAsList(q, pModuleInstance)

  22. Provenance API Results • Implemented queries for each system and a combination of all three • Annotation issues for a couple queries • Example: Query 1 Results vt3:4 --> vt3:7 vt3:1 --> vt3:4 vt3:0 --> vt3:1 pas2:http://relation.org/softmean --> vt3:0 myg1:urn:www.mygrid.org.uk/process#reslice1 --> pas2:http://relation.org/softmean myg1:urn:www.mygrid.org.uk/process#reslice2 --> pas2:http://relation.org/softmean myg1:urn:www.mygrid.org.uk/process#reslice3 --> pas2:http://relation.org/softmean myg1:urn:www.mygrid.org.uk/process#reslice4 --> pas2:http://relation.org/softmean myg1:urn:www.mygrid.org.uk/process#align_warp1 --> myg1:urn:www.mygrid.org.uk/process#reslice1 myg1:urn:www.mygrid.org.uk/process#align_warp2 --> myg1:urn:www.mygrid.org.uk/process#reslice2 myg1:urn:www.mygrid.org.uk/process#align_warp3 --> myg1:urn:www.mygrid.org.uk/process#reslice3 myg1:urn:www.mygrid.org.uk/process#align_warp4 --> myg1:urn:www.mygrid.org.uk/process#reslice4

  23. Provenance API Integration • Developed VisTrails Provenance Query Language for first challenge • Plan to integrate API with query language • Plan to integrate query language with VisTrails interfaces

  24. Interoperability Issues • Uniquely identifying intermediate results • Intermediate file names were not specified and varied • Tracing ids is difficult for users–this should be transparent • A common query language should use concepts familiar to users • Mediator vs. Warehousing approach

  25. Performance Issues • Redundant information can make queries inefficient • What is the best storage backend? • RDBMS vs. XML database? • What is the best data model? • XML vs. Relational vs. RDF? • Need good benchmarks–large data!

  26. Questions?

  27. Mediated Provenance User queries Prov API Mapping from generic provenance model into the models of different systems General Provenance Model wrapper wrapper wrapper Taverna Pasoa …

  28. Mediator Architecture User SQL/ODBC queries Mediator Mapping from global schema into source schemas Global Schema wrapper wrapper wrapper Data Source Data Source Data Source

More Related