1 / 15

WINGS/Pegasus Provenance Challenge

WINGS/Pegasus Provenance Challenge. Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute. WINGS/Pegasus: Workflow Instance Generation and Selection. “ Validate this workflow based on the component specs”. Workflow templates specify

xiu
Download Presentation

WINGS/Pegasus Provenance Challenge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute

  2. WINGS/Pegasus: Workflow Instance Generation and Selection “Validate this workflow based on the component specs” • Workflow templates specify • complex analyses sequences • - Workflow instances specify data WINGS “Show me workflows that generate hazard maps” Workflow Creation Workflow Selection Workflow Libraries EXPERT SCIENTIST Ontologies: Domain terms, Component types, Workflow Products Workflow Template • Specifies data • requirements • Specifies execution • requirements Application Components SCIENTIST (OWL) “Run that with the USGS data set” Data Selection Data Repositories Component Specification - Preexisting data collections - Workflow execution results Workflow Instance SCIENTIST RESEARCHING NEW MODELS “Here is a new wave propagation model, takes in a series of fault ruptures, is compiled for MPI” DAGMan/ Globus Pegasus Executable Workflow

  3. Workflow Template Collections Computational nodes

  4. Workflow Instance

  5. Executable Workflow

  6. Metadata Constraints (in OWL ontology) • Constraints on Files • metadata attributes: data types and default values • Constraints on collections and collection of collection • Type of each element • Relations between metadata of a collection and metadata of individual items • Component-level constraints on metadata attributes of input/output files or collections • Deriving metadata of output files from metadata of input files • Template level constraints on metadata attributes of files or collections • Input/output files of different components can have the same metadata • Checking number of items in collections

  7. Provenance records • Workflow templates specify • complex analyses sequences • - Workflow instances specify data WINGS “Show me workflows that generate hazard maps” Workflow Creation Workflow Selection Workflow Libraries EXPERT SCIENTIST Ontologies: Domain terms, Component types, Workflow Products Workflow Template • Specifies data • requirements • Specifies execution • requirements Application Components SCIENTIST (OWL) “Run that with the USGS data set” Data Selection Data Repositories Component Specification - Preexisting data collections - Workflow execution results Workflow Instance VDS PTC DAGMan/ Globus Pegasus Executable Workflow

  8. Queries answered • Keys to provenance • Capturing the correct metadata and propagating it through the template and instance • Capturing runtime information • Used (SparQL and scripting) and SQL to pose queries • Queries 1,2,5,6,8—query to File and Workflow Instance Ontologies • Query 4—query to the VDS PTC • Queries 3,7,9 —lack of time

  9. FileList Constraints on Nested Collections File Metadata:Int CollectionList 112_12.part5 part3 112_12.part2 127_6.part2 img112_12.part1 img112_1.part1 hasType Domain independent definitions hasType hasType CollOf Collection FileCollection hasItems hasItems Constraints on collection element types Domain dependent definitions hasType hasType AnatomyImages OfPatientInPeriod AnatomyImageFile AnatomyImages OfPatient hasIndexID hasPeriodID hasPatientID Metadata:String Skolem instance definitions hasPatientID … PatientID1 CC-AnatomyImages-Skolem hasPatientID hasTimePeriodID hasType PeriodID1 C-AnatomyImages-Skolem hasTimePeriodID hasType . . . AnatomyImage-Skolem IndexID1 hasIndexID metadataconstraints on collections & their elements hasItems example files and collections hasItems C-AnaImages_P112_p1 hasItems C-AnaImages_P112_p2 CC-AnaImages-for-Patient112 112_2.par3 112-2.part2 img112-2.part1 … … hasItems C-AnaImages_P112_p12

  10. Refinement provenance (in design) • We not only consider the provenance of the executing application but also of the refinement process that maps an abstract workflow (workflow instance) onto a set of resources • The refinement process can be multi-staged • Stages of the refinement can execute on a variety of resources • We capture provenance of the entire workflow as well as workflow constituent • The representations of the refinement and of the workflow provenance are uniform

  11. Original Workflow Workflow 1

  12. 1st executable partition mapped onto resources

  13. Chain of Refinement and Execution Steps

  14. Definition of refinement and execution provenance <object id> [[I/O] data input/output [function performed] [performance info] [optional annotations]] Could include a justification of the reasons for the tasks performed

  15. Provenance records relating to the refinement process <Workflow8>[[I:<AnatomyImage3@S1><AnatomyHeader3@S1><ReferenceImage@S2><ReferenceHeader@S2><AnatomyImage4@S1><AnatomyHeader4@S1>] [O:<WarpParams3><WarpParams4><ReslicedImage3@S1><ReslicedHeader3@S1><ReslicesImage4><ReslicedHeader4>] [<description of tasks in workflow 8> (could be in a form of a DAX (XML-DAG used by Pegasus)), <task_id1_align_warp><task_id2_align_warp><task_id3_reslice><task_id4_reslice>] [<R1,R2><20hours (cumulative time)>…..][]] <task_id1_align_warp>[I:<AnatomyImage3@S1><AnatomyHeader3@S1>, <ReferenceImage@S2><ReferenceHeader@S2> O:<WarpParams3>] [<R1><1hr>…][]] <id1>[I:[<workflow1>O:<<workflow2>;<workflow3>] [<partition>][<host22><2 mins><…>] [<planning horizon set at 5 hours>] <id2>[I:[<workflow2>O:<workflow4>] [<reduction>][ <…..>][<……>] <id5>[I:<workflow6> O:<worfklow7>] [<registration>][<…>][<using primary RLS host14>] <id6>[I:<workflow7> O:<worlflow8>] [<clustering>][<host12><12mins>][] <id7>[I:[<worfklow8>O:< Ø>] [<dagman_exec>][][] Thanks to Luc Moreau for his input!

More Related