1 / 42

State of the Art for Ontology Repositories

Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg, MD April 28, 2008. State of the Art for Ontology Repositories.

wells
Download Presentation

State of the Art for Ontology Repositories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Frank Olken National Science Foundation CISE/IIS/III folken@nsf.gov Presentation to Ontology Summit NIST Gaithersburg, MD April 28, 2008 State of the Art for Ontology Repositories

  2. Opinions expressed in this talk are solely those of the author, and do not reflect the positions of either the National Science Foundation, CISE, IIS or Lawrence Berkeley National Laboratory. Disclaimer F. Olken, Ontology Summit 2008

  3. I will address key issues in the design and implementation of ontology repositories and some of the major technologies being used to address these issues. This talk: F. Olken, Ontology Summit 2008

  4. F. Olken, Ontology Summit 2008 Outline • What is an ontology repository? • Why doe one want one? • Macro vs. Micro Issues • Implementation Issues

  5. F. Olken, Ontology Summit 2008 Implementation Issues • Ontology acquisition, ingestion • Macro vs. micro issues • Centralized vs. Decentralized • Ontology representation • Ontology search, query • Ontology Integration • Auxiliary tools • SOA, etc.

  6. F. Olken, Ontology Summit 2008 What is an Ontology Repository? • System for storing, searching, retrieving multiple ontologies • Support for ontology integration • Variously: • Tools for ontology creation, editing, visualization • Tools for ontology annotation, curation, ....

  7. F. Olken, Ontology Summit 2008 Multiple Ontologies • This is the source of the hardest problems in building ontology repositories: • Scale • Diverse ontology representations • Ontology integration (mapping)‏ • Namespace issues • Complex provenance issues

  8. F. Olken, Ontology Summit 2008 Why would you want an OR? • You need to deal with multiple ontologies • Usual reasons for ontologies: • Natural Language Processing support • Data Integration, Exchange • Data semantics • Support for DB queries • DB, application design • Classification / Indexing of documents, etc. • Creation / maintenance /use of controlled vocabularies

  9. F. Olken, Ontology Summit 2008 Ontology Acquisition • Manual acquisition and loading • e.g. XMDR • Useful if ontology representations are very diverse. • Spidering the web to find ontologies (e.g., Nutch)‏ • Google (etc.) search to find ontologies • How does one recognize an ontology? • Use of OWL, RDF, CL, etc. • Lots of is-a, part-of relations ... • Comments that assert file is an ontology

  10. F. Olken, Ontology Summit 2008 Ontology Ingestion • Parsing ontology, syntactic validation • Consistency checking (no cycles in partial orders: taxonomies, partonomies)‏ • Conversion to common representation (?)‏ • Syntactic translation • Semantic translation • e.g., CWA vs. OWA • Indexing, transitive closure computations, ...

  11. F. Olken, Ontology Summit 2008 Centralized vs. Federated Architectures • Centralized: collect ontologies into one place • High startup, maintenance costs • Fast retrieval, facilitates integration • Federated: ontologies stay put • Low startup, maintenance costs • Less performance, reliability • More requirements on ontology sites • Hybrid • Centralize ontology level metadata, indices • Leave individual ontologies in place

  12. F. Olken, Ontology Summit 2008 Macro vs. Micro-level Issues • Macro-level • Searching across a collection of ontologies and their metadata • Micro-level • Searching, inferencing, within individual ontologies

  13. Most (not all) macro and micro level issues are essentially the same and can use the same technologies for implementation. Macro & Micro similarities F. Olken, Ontology Summit 2008

  14. F. Olken, Ontology Summit 2008 Macro-level Support • Over collections of ontologies • Use an ontology of ontologies • e.g., taxonomy of subject matter • Ontology of ontology metadata

  15. F. Olken, Ontology Summit 2008 Ontology Search • Text-based search • Natural language definitions • Symbols • E.g., Lucene, UIMA • Semantic Search • Over ontology representation (RDF, OWL, CL)‏ • e.g., SPARQL, etc. • e.g., faceted search (e.g., Siderean)‏ • e.g., navigation over taxonomies, etc.

  16. F. Olken, Ontology Summit 2008 Ontology Representations • Text • Frames (OBO)‏ • Graphs (RDF)‏ • Logics (OWL-DL, OWL Full, CL)‏

  17. F. Olken, Ontology Summit 2008 Text Representation • Obvious candidate for ontology representation of informal ontologies, with natural language definitions, etc. .... • A lowest common denominator representation for more formal ontology representations • Readily supports handling diverse ontology representations (must add tags for underlying ontology representation language)‏ • Only supports text search directly

  18. F. Olken, Ontology Summit 2008 Frame Representations • Each frame is a collection of: • (slot, value) pairs or (slot, value list) • Originally deployed in Lisp • Secondary Storage • Each frame is a BLOB • Or, decompose into finer grained DB entries • Current uses: • OBO (open biological ontology) format

  19. F. Olken, Ontology Summit 2008 Graph Representations • a.k.a. Semantic networks, semantic graphs • Examples: RDF, RDF schemas, XLinks • List of edges, each edge: • Subject • Predicate (relation name, attribute name)‏ • Object (or attribute value)‏ • Very flexible • Only support binary relations directly

  20. F. Olken, Ontology Summit 2008 Types of Graphs • Trees • Simple Taxonomies (isa), Partonomies (partof)‏ • Multi-faceted Classifications • Taxonomies with multiple facets • e.g.., Vehicles: purpose, propulsion, wheels, axles, color • Directed acyclic graphs • Multiple inheritance • Partial orders

  21. F. Olken, Ontology Summit 2008 Types of graphs • Arbitrary directed graphs • Allows arbitrary binary relationships • Named graphs • Allows separate inclusion hierarchy • Allow edges to point to/from subgraphs

  22. F. Olken, Ontology Summit 2008 Partial Orders • Many ontologies are Partial Orders (i.e, directed acyclic graphs), e.g., taxonomies, partonomies, ... • Merging ontologies which are partial orders should also yield partial orders • See work of Cliff Joslyn (PNNL)‏

  23. F. Olken, Ontology Summit 2008 Note: • RDF are collections of edges (triples)‏ • No naked nodes allowed

  24. F. Olken, Ontology Summit 2008 Graph Implementations • Represent graph as: • Triple store (as on previous slide)‏ • Quad store (support named graphs)‏ • Standalone system, relational DBMS, column store

  25. F. Olken, Ontology Summit 2008 Quad stores & Named graphs • Quad stores allow named graphs • (named graph, subject, predicate, object)‏ • Named graphs (quads) allow one to name subgraphs (collections of edges) and to refer to them by name • Hence, subjects and objects are no longer just nodes, but may be subgraphs (collections of edges)‏

  26. F. Olken, Ontology Summit 2008 Secondary storage of graphs • Long skinny relations • Triples or quads • Column stores (Monet DB, Vertica)‏ • Multiple indices sorted by: subject, predicate, object, combinations, ... • Clusters of edges (Cogito)‏

  27. F. Olken, Ontology Summit 2008 Semantic graph query languages • SPARQL is now the primary candidate • Undergoing W3C “standardization”

  28. F. Olken, Ontology Summit 2008 Logic-based Ontology Representations • Description Logic (e.g., OWL-DL)‏ • Restricted to make it decidable and computationally tractable • Typically, lacks cardinality constraints, arithmetic • Datalog (Horn clause logic + recursion)‏ • Prolog based • First Order Logic (e.g., Common Logic)‏ • IKL (FOL + name propositions)‏

  29. F. Olken, Ontology Summit 2008 Logic-based representations • Precise, formal semantics • Expressiveness (esp. FOL)‏ • Issues of scaling, decidability, computational tractability • Esp. for FOL • Description Logics growing usage • DL + rules languages to approx. FOL

  30. F. Olken, Ontology Summit 2008 Materialization of Partial Orders • Partial orders = taxonomies, partonomies • Typically specified as direct “edges” • Immediate is-a, or part-of relations • Naïve implementation requires repeated traversal of the partial order graph. • Materialization of the transitive closure of the partial order (e.g., taxonomy) can reduce query times • However, initialization and maintenance are expensive in time and storage

  31. F. Olken, Ontology Summit 2008 Ontology Constraints • Type constraints • Range, domain constraints • Cardinality constraints on relations • DB Integrity constraints • Functional dependencies • Inclusion dependencies (foreign key constraints)‏ • Invertibility • Disjointedness (of subclasses)‏

  32. F. Olken, Ontology Summit 2008 Need for Provenance • Fiction: • Ontologists write definitions ab initio • Reality: • Most “definitions” are written by: • Administrators (e.g., Code of Federal Regulations)‏ • Legislatures (legislation)‏ • Judges (court decisions)‏ • Professional bodies (accounting regulations)‏

  33. F. Olken, Ontology Summit 2008 Implications for Provenance • We need to track the provenance of definitions • Typically this requires citations to external documents • May also require tracking of individual “definition” decisions .... • Varying granularity requirements • Individual definitions • Collections of axioms, definitions • Examples: see ISO 11179, XMDR

  34. F. Olken, Ontology Summit 2008 Other Tools • Ontology Creation tools • Ontology Editors • Ontology Differencing tools • Ontology modularization tools (clustering, etc.)‏ • Ontology Export • Ontology Visualization (e.g., graph visualization)‏ • Version management • Access control

  35. F. Olken, Ontology Summit 2008 SOA: Service Oriented Architecture • Very popular • Permit distributed implementations • Two major alternatives: • REST (Representational State Transfer)‏ • Built on HTTP (get, put, delete, post operators)‏ • URL/URI addresses for all objects • SOAP/WSDL • Based on XML Remote Procedure Calls

  36. F. Olken, Ontology Summit 2008 REST vs SOAP • REST • Simple to implement • Requires little more than: • HTTP server • XML parsers • SOAP • Much more software complexity • Lots of software tooling from commercial vendors • Better security ?

  37. Use REST. My advice on REST vs. SOAP: F. Olken, Ontology Summit 2008

  38. F. Olken, Ontology Summit 2008 Ontology Repository Related Standards • ISO/IEC 11179 Metadata Registries version 3.0 of Part 3)‏ • OMG ODM Ontology Definition Metamodel • ISO 13250 Topic Maps • XML Topic Maps Specification (topicmaps.org)‏ • W3C OWL recommendations • W3C RDF recommendations

  39. F. Olken, Ontology Summit 2008 Ontology Related Standards • ISO/IEC 24707 Common Logic • ISO TC 37 Terminology Services Standards • W3C SKOS Simple Knowledge Organization System Reference • ISO/IEC 19763 Metamodel Framework for Interoperability (Ontology metadata)‏

  40. F. Olken, Ontology Summit 2008 Recapitulation • Ontology Repositories support storage, search, retrieval of multiple ontologies and ontology integration • Macro-level & Micro-level support and search pose similar problems • A common ontology representation is desirable, but difficult • Multiple ontology representations and ontology integration are the most difficult issues aspects.

  41. F. Olken, Ontology Summit 2008 Acknowledgements • This work was supported by NSF IPA agreement with LBNL, IRD support. • My earlier work on ontology repositories at LBNL was supported by EPA and DOD. • The author would like to thank Joel Sachs, Mark Musen, Natasha Noy, Eric Neumann, Bob MacGregor, Cliff Joslyn, Kevin Keck, Elise Kendall, Mala Mehrotra, Dan Abadi, Deb McGuiness, et al. for their remarks to me about knowledge representation, ontology repositories and ontology mappings.

  42. F. Olken, Ontology Summit 2008 Contact Information • Frank Olken • National Science Foundation • 4201 Wilson Blvd., Suite 1125 • Arlington, VA 22230 • Email: folken@nsf.gov • Tel: 703-292-8930 (receptionist)‏ • Tel: 703-292-7350 (direct)‏

More Related