1 / 22

Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18. Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University. Outline. Architecture of Karma Workflow Setup & Collecting Provenance Provenance Traces “canonical” Challenge Queries Suggested Variations.

cooper
Download Presentation

Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Karma Provenance Framework v2Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University

  2. Outline • Architecture of Karma • Workflow Setup & Collecting Provenance • Provenance Traces • “canonical” Challenge Queries • Suggested Variations

  3. Provenance Collection: Challenges & Uses • Linked Environments for Atmospheric Discovery (LEAD) project • Weather & Severe Storm Prediction Applications • Provenance on workflow (process) & data products at fine granularity • Dynamic, Long running workflows • Helps scientists to • search for workflows & data products • estimate data quality, • track workflow execution, and • analyze & mine data products from runs

  4. Karma Provenance Framework • Lightweight – do not duplicate existing metadata cataloging effort • myLEAD personal metadata catalog • ResCat service & data registry • Glue to integrate metadata on data & services with runtime workflow information • Scalability1 – 500 users, 100’s of workflows, 10,000’s of data products [1] Performance Evaluation of the Karma Provenance Framework, Simmhan, Y., et al.; IPAW, 2006

  5. Karma Provenance Framework • Key Provenance Activities generated during lifetime of wrokflow • Workflow | Service Invoked • Data Consumed • Data Produced • Sending Response • Activities modeled as XML messages • Published asynchronously by service|workflow|client • Presently use WS-Eventing messaging system • Activities stored in relational database

  6. Message Bus WS-EventingService API Query for Workflow, Process, & Data Provenance Karma Provenance Service Provenance Browser Client Provenance Listener Provenance Query API Activity DB Subscribe & Listen to Activity Notifications WS-Messenger Notification Broker WorkflowInvoked & SendingResponse Activities Publish Provenance Activities as async Notifications ServiceInvoked & Sending Response, Data–Produced & –Consumed Activities Karma Architecture1 Workflow Engine Workflow Instance 10 Data Products Consumed & Produced by each Service Orchestration Service 1 Service 2 Service 9 Service 10 … 10C 10P 10P 10C 10P/10C 10P/10C [1] A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y., et al., Submitted to ICWS Conference, 2006

  7. Provenance Challenge Workflow • Applications modeled as web-services • Generic Factory toolkit creates web-service wrappers for command-line applications • Service invokes a shell-script/application, passing command-line arguments • Created services automatically instrumented to generate provenance using Karma client library • Workflow composed as GPEL* script • XBaya Workflow composer GUI • Central GPEL workflow engine orchestrates execution *Grid Process Execution Language, an extension of the Business Process Execution Language (BPEL)

  8. Provenance Challenge Workflow

  9. Provenance Traces – Building Block Queries • Data Provenance: get[Recursive]DataProvenance • What (ID), where (URL), when (Timestamp) • How (Process, inputs)

  10. Provenance Traces – Building Block Queries • Process Provenance: getProcessProvenance • What (ID), when (Timestamp), who (Invoker) • State (execution/completion status) • Input & Output data products

  11. Provenance Traces – Building Block Queries • Workflow Trace: getWorkflowTrace • What (ID), when (Timestamp), who (Invoker) • State (execution/completion status) • Process provenance of workflow steps

  12. Provenance Challenge Queries • ! Answered by Karma Service API Directly •  Answered by Karma Service API, with post-processing by client • ~ Answered by access to backend DB (SQL) •  Not answered

  13. Provenance Challenge Queries: Q1 • Find everything that caused Atlas X Graphic to be as it is • ! Answered by Karma Service API Directly • This is the recursive data provenance of the Atlas X Graphic file • A call to getRecursiveDataProvenance( ‘lead:uuid:1157946992-atlas-x.gif’) returns this [www]

  14. Provenance Challenge Queries: Q2 • Find the process that led to Atlas X Graphic, excluding all prior to softmean •  Answered by Karma Service API, with post-processing by client • First call getDataProvenance • Then recursively get data provenance till ‘SoftmeanService’ is seen Returns this [www] 1. let $dataList := ['lead:uuid:1157946992-atlas-x.gif'] 2. while ($dataList != empty) do // get data provenance for this level a. $dataProvenance = karma.getDataProvenance($dataList[0]) // print process information & remove data from list b. Print $dataProvenance; $dataList.delete(0) c. if ($dataProvenance.getProducedBy() == 'SoftmeanService') break; // found Softmean. Stop. // get input data used by this data & recurse up the tree d. foreach ($inputData in $dataProvenance.getUsingData()) do i. $dataList.add($inputData) 3. End

  15. Provenance Challenge: Q4 • Find all invocations of align_warp with parameter "-m 12" that ran on a Monday • ~ Answered by access to backend DB (SQL) • Use SQL query to get matching invocations • Call getProcessProvenanceto get description of align_warp Returns this [www] SELECTinvokee.workflow_id, invokee.service_id, invokee.workflow_node_id, invokee.workflow_timestep, invoker.workflow_id, invoker.service_id, invoker.workflow_node_id, invoker.workflow_timestep FROM invocation_state_table invocation, entity_table invokee, entity_table invoker, notification_table notifications WHEREinvokee.entity_id = invocation.invokee_id AND invoker.entity_id = invocation.invoker_id AND notifications.source_id = invocation.invokee_id AND notifications.notification_type = 'ServiceInvoked' AND invokee.service_id = 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' ANDnotifications.notification_xml LIKE'%<ModelMenuNumber>12</ModelMenuNumber>%‘ ANDDayOfWeek(invocation.request_receive_time) = 2; // 1=Sunday, 2=Monday, ...

  16. Provenance Challenge: Q9 • Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files. •  Not answered • We do not expect to answer such queries through the provenance system • We push the provenance informationto external metadata management systems such as MyLEAD, which can answer such “join” queries on data product metadata and provenance

  17. Variations of Workflow • Workflows with loops • Workflows whose structure changes dynamically • or, as a simpler case, workflows with conditional branches • Hierarchical composition of workflows • workflows invoking other workflows • ~Similar to user-views (UPenn), nested-workflows (myGrid), …

  18. Variations of Queries • Find all [workflows | processes] with a particular execution status [completed | failed | waiting for input] • Dynamic attribute of provenance? • Query for client view and service view of the provenance • Check for differences

  19. AcknowledgementsAlek Slominski (GPEL Engine)Satoshi Shirasuna (XBaya Composer)LEAD MembersNSF Questions www.extreme.indiana.edu/karma

  20. Sample Activities Published • More here [www]

  21. Karma DB Schema

More Related