1 / 21

Provenir ontology: Towards a Framework for eScience Provenance Management

Provenir ontology: Towards a Framework for eScience Provenance Management. Satya S. Sahoo , Amit P. Sheth Kno.e.sis Center, Wright State University. Microsoft eScience Workshop 2009 Pittsburgh, Oct 16. Outline. Provenance: A Tale of Two Use Cases

amiel
Download Presentation

Provenir ontology: Towards a Framework for eScience Provenance Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Provenir ontology: Towards a Framework for eScience Provenance Management Satya S. Sahoo, Amit P. Sheth Kno.e.sis Center, Wright State University Microsoft eScience Workshop 2009 Pittsburgh, Oct 16

  2. Outline • Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research

  3. Provenance in GlycoProtein Analysis Cell Culture extract Glycoprotein Fraction proteolysis Glycopeptides Fraction 1 Separation technique I n Glycopeptides Fraction PNGase n Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction Data reduction ms peaklist ms/ms peaklist binning Peptide identification Parent protein and peptide list N-dimensional array Peptide list Data correlation Signal integration ? Proteolytic enzyme

  4. Provenance in Parasite Research Gene Name • Provenance from the French word “provenir” describes the lineage or history of a data entity • For Verification and Validation of Data Integrity, Process Quality, and Trust • Issues in Provenance Management • Interoperability • Consistent Modeling • Reduce Terminological Heterogeneity Gene Knockout and Strain Creation* Sequence Extraction 3‘ & 5’ Region Drug Resistant Plasmid Gene Name Plasmid Construction Knockout Construct Plasmid T.Cruzi sample ? Transfection Transfected Sample Drug Selection Cloned Sample Selected Sample Cell Cloning Cloned Sample *T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia

  5. Outline • Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research

  6. Ontologies for Provenance Modeling • Advantages of using Ontologies • Formal Description: Machine Readability, Consistent Interpretation • Use Reasoning: Knowledge Discovery over Large Datasets • Problem: A gigantic, monolithic Provenance Ontology! – not feasible • Solution: Modular Approach using a Foundational Ontology FOUNDATIONAL ONTOLOGY PARASITE EXPERIMENT GLYCOPROTEIN EXPERIMENT OCEANOGRAPHY

  7. Outline • Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research

  8. Provenir Ontology Gene Name Sequence Extraction 3‘ & 5’ Region Drug Resistant Plasmid AGENT Plasmid Construction Knockout Construct Plasmid T.Cruzi sample has_agent Transfection Transfection Machine DATA Transfected Sample Drug Selection participates_in Selected Sample PROCESS Cell Cloning Cloned Sample

  9. Provenir Ontology Schema SPATIAL THEMATIC TEMPORAL is_a is_a is_a located_in PARAMETER DATA COLLECTION is_a is_a AGENT has_temporal_value DATA participates_in has_agent PROCESS preceded_by

  10. Domain-specific Provenance: Parasite Experiment ontology PROVENIR ONTOLOGY agent has_agent is_a is_a data parameter has_participant is_a data_collection is_a process is_a spatial_parameter temporal_parameter domain_parameter is_a is_a is_a is_a is_a is_a transfection_machine location is_a drug_selection is_a is_a sample has_participant Time:DateTimeDescritption transfection cell_cloning is_a transfection_buffer strain_creation_ protocol Tcruzi_sample PARASITE EXPERIMENT ONTOLOGY has_parameter *Parasite Experiment ontology available at: http://wiki.knoesis.org/index.php/Trykipedia

  11. Trident Ontology for Oceanography

  12. Outline • Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research

  13. Provenance Query Classification Classified Provenance Queries into Three Categories • Type 1: Querying for Provenance Metadata • Example: Which gene was used create the cloned sample with ID = 65? • Type 2: Querying for Specific Data Set • Example: Find all knockout construct plasmids created by researcher Michelle using “Hygromycin” drug resistant plasmid betweenApril 25, 2008 and August 15, 2008 • Type 3: Operations on Provenance Metadata • Example: Were the two cloned samples 65 and 46 prepared under similar conditions – compare the associated provenance information

  14. Provenance Query Operators Four Query Operators – based on Query Classification • provenance () – Closure operation, returns the complete set of provenance metadata for input data entity • provenance_context() - Given set of constraints defined on provenance, retrieves datasets that satisfy constraints • provenance_compare () - adapt the RDF graph equivalence definition • provenance_merge () - Two sets of provenance information are combined using the RDF graph merge

  15. Provenance Query Engine Architecture QUERY OPTIMIZER • Available as API for integration with provenance management systems • Input: • Type of provenance query operator : provenance () • Input value to query operator: cloned sample 65 • User details to connect to underlying Oracle RDF store TRANSITIVE CLOSURE

  16. Outline • Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research

  17. T.cruzi SPSE Provenance Management System

  18. Conclusions • Provenir ontology as a foundational model for provenance • Extensible to model domain-specific provenance • Parasite Experiment ontology • Trident ontology • ProPreO ontology • Query Infrastructure to support provenance modeled using Provenir ontology • Application in a NIH-funded project for Parasite Research

  19. Acknowledgement • Roger Barga– Microsoft Research, eScience • D. Brent Weatherly – Center for Tropical and Emerging Diseases, University of Georgia • Flora Logan – The Wellcome Trust Sanger Institute, Cambridge, UK • RaghavaMutharaju– Kno.e.sis Center, Wright State University • PramodAnantharam- Kno.e.sis Center, Wright State University

  20. References • Provenir ontology: http://wiki.knoesis.org/index.php/Provenir_Ontology • Provenance Management in Parasite Research: http://knoesis.wright.edu/library/resource.php?id=00712 • Provenance Management Framework: http://knoesis.wright.edu/research/semsci/application_domain/sem_prov/ • T.cruzi Semantic Problem Solving Environment: http://knoesis.wright.edu/research/semsci/application_domain/sem_life_sci/tcruzi_pse/

More Related