1 / 61

ONTOLOGY-DRIVEN DISCOVERY OF SCIENTIFIC COMPUTATIONAL ENTITIES

ONTOLOGY-DRIVEN DISCOVERY OF SCIENTIFIC COMPUTATIONAL ENTITIES. Pearl Brazier Department of Computer Science University of Texas-Pan American November 2, 2010. Outline. Motivation Research Goals and Objectives Significance of Contribution Background Information and Context

tale
Download Presentation

ONTOLOGY-DRIVEN DISCOVERY OF SCIENTIFIC COMPUTATIONAL ENTITIES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ONTOLOGY-DRIVEN DISCOVERY OF SCIENTIFIC COMPUTATIONAL ENTITIES Pearl Brazier Department of Computer Science University of Texas-Pan American November 2, 2010

  2. Outline • Motivation • Research Goals and Objectives • Significance of Contribution • Background Information and Context • Research Efforts • GEO-SEED Architecture • Scientific Computational Entity Discovery Ontology • RDF Repository • Usability and Performance Studies • Conclusions and Future Work November 2, 2010 2

  3. Motivation: Geosciences Web Services • Web contains many scientific resources • Scientific data (sharing datasets, experimental results) • Geosciences web services metadata • Resources are currently shared via • publication • human contact • web portals • Metadata annotations needed to • assist collaboration • allow machine processing November 2, 2010 3

  4. Research Goal To investigate an ontology‐driven discovery approach that can be distributed on the Web and that can support the elicitation, documentation, and registration of computational entities and other resources November 2, 2010 4

  5. Background Information and Context 5

  6. Cyberinfrastructure/e-science • Supports building new types of scientific and engineering knowledge environments and organizations • Supports modern in-silico experiments that can lead to important scientific discoveries through scientific data repositories, semantic mediation services, and scientific workflows • Describes computationally intensive science, which is carried out in highly distributed network environments, or science that uses immense data sets 6

  7. Web Technologies-1 • Web 2.0 Technologies • Includes social networks and Wiki technologies • Used by humans • Semantic Web • Allows machines to understand meaning of information on the Web • Used by machines and automated agents • Supports core standards such as RDF, SPARQL, OWL <?xml version="1.0" encoding="utf-8" ?> - <rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:s="http://www.schemaweb.info/schemas/meta/rdf/" xmlns:foaf="http://xmlns.com/foaf/0.1/"> - <s:Schema rdf:about="http://www.schemaweb.info/schema/SchemaDetails.aspx?id=62"> <s:id>62</s:id> <s:name>Wine Ontology</s:name> <s:description xml:lang="en-gb">Sample ontology used in the OWL specification documents.</s:description> <s:namespace>http://www.w3.org/TR/2003/CR-owl-guide-20030818/wine#</s:namespace> < </s:Schema> </rdf:RDF> 7

  8. Web Technologies-2 • Ontologies • Captures concepts and relationships among them • Provides standard vocabulary and classifications • Web Services Metadata • WSDL • WSDL-S • OWL-S and SWSO 8

  9. Research Objectives and activities 9

  10. Objective 1 Define an ontology for scientific computational entities that supports the development of a repository and a system that can retrieve computational entities. Activities: • Define use cases for the ontology. • Determine the essential elements of an ontology that documents the features and relationships used to identify computational entities and distinguish one from another. 10

  11. Objective 2 Define an architecture that supports an ontology-driven approach. Activities • Investigate efficient approaches for storing information. • Investigate the relationships of registration, annotation, and knowledge extraction. • November 2, 2010 11

  12. Objective 3 Evaluate the usability of a system based on the ontology-driven discovery approach. Activities • Design and implement a prototype system based on the ontology-driven discovery approach. • Conduct a usability study of the prototype system with computer scientists and geoscientists (novices and experts). 12

  13. Objective 4 Evaluate the performance of a system based on the ontology-driven approach. Activities • Design the schema and implement a relational RDF repository that supports efficient storage and querying of documented scientific computational entities based on the ontology-driven discovery approach. • Run a simulation to analyze the performance of the system. 13

  14. Research Contributions and Significance • Designed new Scientific Computational Entity Discovery Ontology • More comprehensive and domain specific than existing discovery ontologies, enabling the scientist to more easily share their computational entities • Created a novel design for organizing the RDF data • Uses SPARQL queries for the RDF representation • Supports more efficient query evaluation • Developed GEO-SEED wiki using Web 2.0 and Semantic Web Technologies • Supports discovery and sharing of scientific computational entities 14

  15. Research efforts: GEO-SEED Architecture and Ontology 15

  16. 16

  17. GEO-SEED ARCHITECTURE 17

  18. GEO-SEED Scientific Computational Entity Discovery Ontology 18

  19. Computational Entity Profiles 19

  20. General Profile Descriptors 20 20

  21. QoS Profile Descriptors 21

  22. Deployment Profile Descriptors 22

  23. Invocation Profile Descriptors 23

  24. Implementation Descriptors 24

  25. Geoscience Descriptors 25

  26. Scientific Computational Entity Discovery Ontology 26 26

  27. Research Efforts: RDF repository 27

  28. Schema Mapping Strategies • Five approaches to generate database schemas: • Schema-Oblivious • Schema-Aware • Data Driven • User-Customizable • Hybrid 28

  29. Schema-Oblivious (Triple Table) Triple predicate object subject <:WS1> <rdf:type> <:WebService> . <:WS1> <:describedBy> <:GP1> . <:WS1> <:describedBy> <:QoSP1> . <:GP1> <rdf:type> <:GeneralProfile> . <:QoSP1> <rdf:type> <:QoSProfile> . <:GP1> <:subject> <:Gridding> . <:GP1> <:author> "Pearl Brazier" . <:QoSP1> <:trust> "5" . <:QoSP1> <:availability> “0.9" . <:QoSP1> <:overallRating> “4" . Extracted RDF Triples 29

  30. Schema-Aware (Property Table) Property_type <:WS1> <rdf:type> <:WebService> . <:WS1> <:describedBy> <:GP1> . <:WS1> <:describedBy> <:QoSP1> . <:GP1> <rdf:type> <:GeneralProfile> . <:QoSP1> <rdf:type> <:QoSProfile> . <:GP1> <:subject> <:Gridding> . <:GP1> <:author> "Pearl Brazier" . <:QoSP1> <:trust> "5" . <:QoSP1> <:availability> “0.9" . <:QoSP1> <:overallRating> “4" . Property_author Property_describedBy 30

  31. User Customizable (Profile Tables) GeneralProfile <:WS1> <rdf:type> <:WebService> . <:WS1> <:describedBy> <:GP1> . <:WS1> <:describedBy> <:QoSP1> . <:GP1> <rdf:type> <:GeneralProfile> . <:QoSP1> <rdf:type> <:QoSProfile> . <:GP1> <:subject> <:Gridding> . <:GP1> <:author> "Pearl Brazier" . <:QoSP1> <:trust> "5" . <:QoSP1> <:availability> “0.9" . <:QoSP1> <:overallRating> “4" . QoSProfile 31

  32. SPARQL Query Retrieves the quality-of-service descriptors of a Web service :WS1: Select ?profile ?pre ?obj Where { :WS1 :describedBy ?profile . ?profile rdf:type :QoSProfile . ?profile ?pre ?obj . } 32

  33. Query Complexity Comparison • Triple Table: Two Joins Triple Triple Triple. Note: Tables can get large • Property Table: Two Joins describedBy type (trust ⋃ reliability ⋃ availability ⋃ ⋯⋃ userReview) Note: Union result is not indexed 33

  34. OR Many Joins: (describedBy type trust) ⋃(describedBy type reliability)⋃ (describedBy type availability) ⋃⋯⋃(describedBytype userReview) Note: Indexed but re-computes the (describedBy type) many times • Profile Table: One Join describedBy QosProfile 34

  35. Empirical Comparison of the Three Approaches • Created a GEO-SEED dataset that describes 10,000 web services. • Defined six common queries using SPARQL • Ran queries on PC with 3.00 GHz Intel Core 2 CPU, 4GB RAM, 750 GB disk space running • Evaluated the execution time 35

  36. Performance Test Queries • Find web services that implement a computational entity with the name “gridding” • Find web services, along with their user reviews and overall quality-of-service ratings, that implement a computational entity “gridding” • Find web services that implement a computational entity with the name “gridding” and that have trust ≥ 4 and availability ≥ 0.8 ratings • Retrieve a general profile of a particular Web service. • Retrieve a quality-of-service profile of a particular Web service • Retrieve quality-of-service profiles of two Web services 36

  37. Performance Study Results 37

  38. Research Efforts: GEO-SEED WIKI Prototype 38

  39. Overview • GEO-SEED consists of two components: Wiki and RDF repository • Wiki serves as a collaborative environment for knowledge sharing of geosciences web services. • Provides interface for human interaction • RDF repository serves as a meta-data database readily accessible by machines and automated agents 39

  40. conclusions 40

  41. Conclusions • GEO-SEED architecture supports a new generation Web portal • metadata repository for scientific computational entities in geosciences for sharing and discovery • Ontology-driven profiles approach supports usability for • Humans • Machines • Unique User-customizable profile table design for storing the RDF data allows efficient queries of large metadata collections 41

  42. Future Work • Explore user-guided metadata extraction algorithms for the Wiki • Explore coupling GEO-SEED with an existing SWFMS • Extend the project to support annotation and discovery of scientific workflows and datasets in geosciences • Refine the prototype to address user interface issues 42

  43. Thank You!Questions? Spring 2010 Summer 2010 Cactus from El Paso 2005 43

  44. UTEP ComputerScienceDissertationDefense Presentations • Abraham, John, Brazier, Pearl, Chebotko, Artem, Jaime Navarro, and Piazza, Anthony, "Distributed Storage and Querying Techniques for a Semantic Web of Scientific Workflow Provenance", in Proc. of the 7th IEEE International Conference on Services Computing (SCC'10), Miami, Florida, USA, July 5-10, 2010. Acceptance rate: 18%Download • Brazier, Pearl, Chebotko, Artem, Gonzalez, Eric, Kashlev, Andrey, and Piazza, Anthony, "Supporting Geosciences Web Services Metadata Management and Discovery", in Proc. of the 7th IEEE International Conference on Services Computing (SCC'10), Miami, Florida, USA, July 5-10, 2010.   • Brazier, Pearl, Chebotko, Artem, Gates Ann Q., Piazza, Anthony, and Salayandia, Leonardo. (2009) “Web 2.0 and Semantic Web Portal for Annotation and Discovery of Web Services in Geosciences”, Presented and published in 2009 International Conference on Semantic Web and Web Services (SWWS 2009), Las Vegas, Nevada, July 13-16, CSREA Press, USA. • Brazier, Pearl, Chebotko, Artem, Gates Ann Q., and Salayandia, Leonardo. (2009) “GEO-SEED: A Metadata Repository for Geosciences Web Service Discovery”, Presented and published IEEE 2009 Third International Workshop on Scientific Workflows (SWF 2009), Los Angeles, CA., July 6-10. 44

  45. UTEP DissertationDefense 45

  46. UTEP DissertationDefense Research Efforts: Usability study 46

  47. UTEP DissertationDefense Usability Study Overview • 31 Invitations sent to Geology faculty, students, Computer Science faculty and students (17 Responses) • Steps in study: • Register • Login • Submit a computational entity • Search for a computational entity • Add a user rating for an entity • Complete a survey rating the experience 47

  48. UTEP DissertationDefense Overall GEO-SEED would be a useful tool for sharing 48

  49. UTEP DissertationDefense Other Usability Study Results 49

  50. UTEP DissertationDefense Descriptive Statistical Analysis • BINOM-DIST • Grouped responses into two group • Strongly Disagree + Disagree • Agree + Strongly Agree • Compared p-values for < 0.05 • t Test • Used 4 groups • Strongly Disagree + Disagree + Agree + Strongly Agree • Compared p-values for < 0.05 50

More Related