1 / 24

Capturing provenance data

Capturing provenance data. Dr Alison McKay (in place of Dr Richard Bagshaw) University of Leeds, School of Mechanical Engineering. Purpose of presentation. to present the DAME provenance research to discuss the experiences of deploying this technology in a Grid based systems.

Download Presentation

Capturing provenance data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Capturing provenance data Dr Alison McKay (in place of Dr Richard Bagshaw) University of Leeds, School of Mechanical Engineering

  2. Purpose of presentation • to present the DAME provenance research • to discuss the experiences of deploying this technology in a Grid based systems

  3. Outline of presentation • What do we mean by “provenance data”? • What are we aiming for? • What does achieving this goal entail? • What progress has been made to date? • What remains to be done?

  4. Provenance Data • Recording the history of data and its place of origin

  5. Workflow Script Provenance Viewer Workflow Instance Workflow Instance Workflow Instance Workflow Instance Workflow Instance Service Instance Workflow Manager Workflow Advisor Provenance Database DAME Provenance Architecture Workflow Definition (BPEL)

  6. Outline of presentation • What do we mean by “provenance data”? • What are we aiming for? • What does achieving this goal entail? • What progress has been made to date? • What remains to be done?

  7. Entry into Service Engine Launch Stage 1 New Project Planning Stage 4 In-Service Monitoring & Technical Support Business Concept Definition Stage 3 Propulsion System Realisation Stage 2 Full Concept Definition Identify the Need Preliminary Concept Definition Capability Acquisition RR Integrated Product Development process

  8. DAME provenance data users Legal Implications Contractual Obligations Audit Trail Troubleshooting Re-run diagnosis Provenance Requirement

  9. position of an engine, ie, its current state of health extra T T Potential benefits • failure mode curves • Position and shape depend on • engine type (from PDM/SDM) • engine state (eg, age) • events (eg, from QUOTE data) Time this line shows when failure occurs – its position and shape depends upon its operating environment

  10. Specific tasks to be supported • Create an audit trail (Who, What, Where, Why, When, Which, hoW) • Re-execute a workflow process • repeat a workflow process (same Grid resources & services, sequence and data) • rerun a workflow process (same Grid resources & services and sequence on different data)

  11. Outline of presentation • What are we aiming for? • What does achieving this goal entail? • What progress has been made to date? • What remains to be done?

  12. Initial requirements • Support the re-execution of workflows with new data * • Provide provenance data for the Workflow Advisor • Provide a viewer to captured provenance data * As opposed to repeating a given workflow using the same data and resources

  13. DS&S perspective on requirements • Origin of data fully traceable • (Including time and date stamps) • Processed data traceable through application software • Any human interaction/annotations must be captured

  14. Research issues Specify Define Execute / deploy Product Product Data Management system Service Data Manager Process Workflow process definition Workflow execution data

  15. Process definition (as defined) [GRID] resource callee id start GRID resource usage date_and_ time resource name caller end description process outcome why_used executed_by of id process element process definition (1) description process relationship composition relationship (1) related relating * process element relationship connection relationship

  16. Process definition (as executed) Case Workflow Resource Case_id User_id Open_date Close_date Flight_start_date Deadline_date Tail_number Airline Airport Stand Quote_diagnosis Quote_status Engineer Engineer_active Engineer_why Analyst Analyst_active Analyst_why Expert Expert_active Expert_why Workflow_sequence_number Workflow_id Workflow_author_id Workflow_name Workflow_description Workflow_start_date Workflow_end_date Workflow_ip_data_type Workflow_op_data_type Workflow_diagnosis Workflow_status Resource_sequence_number Resource_id Resource_name Resource_type Resource_description Resource_start_time Resource_end_time Resource_location Resource_configuration Resource_version_number Resource_status Resource_req_no_of_processors Resource_req_memory Resource_req_operating_system Resource_req_op_sys_ver_number

  17. MyGrid Workflow Provenance • Workflow instance capture • Workflow overview • Workflow ID, Status, Start Time, End Time, O/All input and outputs, Service List. • Service Invocations • Status, Start Time, End Time, WSDLURI, DataSets x 2. • Inputs and Outputs • ID, Name, Type, Value

  18. Outline of presentation • What do we mean by “provenance data”? • What are we aiming for? • What does achieving this goal entail? • What progress has been made to date? • What remains to be done?

  19. Look at SDM to select an engine Get XTO control files for selected engine Run XTO for selected engine XTO Control Files SDM MySQL-SDM2 XTO Legend Interface (transfer) resource Data storage resource Transient data resource Compute resource Application resource Interface (search) resource User executed process step XTO CR1 Data interface GRID resource

  20. Software (Microsoft .Net) Software (Java) Software (Java) Product data database Graphical user interface Web service: Structure constructor Web service: Database BOM data viewer

  21. Outline of presentation • What do we mean by “provenance data”? • What are we aiming for? • What does achieving this goal entail? • What progress has been made to date? • What remains to be done?

  22. Remaining tasks • Support the re-execution of workflows with new data • Provide provenance data for the Workflow Advisor • Provide a viewer for captured provenance data • Provide audit trail for accountability purposes

  23. Provenance research issues • Provenance requirements and scope • Provenance data security • Data storage format • Centralised provenance data • Stop points for audit trails • Repeatability of GRID resources

  24. Longer term research Specify Define Execute / deploy Product Requirements definition Product Data Management system Service Data Manager Process Workflow process specification Workflow process definition Workflow execution data

More Related