1 / 40

Information Artifact Ontology: General Background

Information Artifact Ontology: General Background. Barry Smith. Slides. http://ncorwiki.buffalo.edu/index.php/STIDS_2013. Barry Smith – who am I?. Director: National Center for Ontological Research (Buffalo) Founder: Ontology for the Intelligence Community (OIC, now STIDS) conference series

odell
Download Presentation

Information Artifact Ontology: General Background

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Artifact Ontology: General Background Barry Smith

  2. Slides http://ncorwiki.buffalo.edu/index.php/STIDS_2013

  3. Barry Smith – who am I? Director: National Center for Ontological Research (Buffalo) Founder: Ontology for the Intelligence Community (OIC, now STIDS) conference series Ontology work for NextGen (Next Generation) Air Transportation System National Nuclear Security Administration, DoE Joint-Forces Command Joint Warfighting Center Army Net-Centric Data Strategy Center of Excellence Army Intelligence and Information Warfare Directorate (I2WD) and for many national and international biomedical research and healthcare agencies

  4. I2WD Ontology Team Ron Rudnucki CUBRC, University at Buffalo Dr. Tatiana Malyuta NY City College of Technology of CUNY, Data Tactics Corp. David Salmen Data Tactics Corp. LCOL Dr. William Mandrick Data Tactics Corp.

  5. In the olden days people measured lengths using inches, ulnas, perches, king’s feet, Swiss feet, leagues of Paris, etc., etc.

  6. On June 22, 1799, in Paris,everything changed

  7. International System of Units (SI)

  8. Making data (re-)usable through standard terminologies • Standards provide • common structure and terminology • single data source for review (less redundant data) • Standards allow • use of common tools and techniques • common training • single validation of data

  9. One successful part of the solution to this problem = Ontologies controlled vocabularies (nomenclatures) plus definitions of terms in a logical language Standardized (logically defined) terms in an ontology arethe equivalent of standardized units in the SI

  10. Ontologies • are computer-tractable representations of types in specific areas of reality • are more and less general (upper and lower ontologies) • upper = organizing ontologies • lower = domain ontology modules

  11. Linked Open Data are not enough

  12. Links are inconsistently defined; ontologies are full of redundancies

  13. Towards coordination of modular non-redundant ontologies

  14. Environments Environment Ontology (EnvO)

  15. OBO Foundry approach extended into other domains

  16. Horizontal Integration of Big Intelligence Data The Role of Ontology in the Era of Big Data T. Malyuta, Ph. D New York City College of Technology, NY, NY B. Smith, Ph. D University at Buffalo, Buffalo, NY R. Rudnicki CUBRC, Buffalo, NY

  17. http://ncorwiki.buffalo.edu/ index.php/Main_Page#Documents

  18. Big Data Problem • Wikipedia defines Big Data as “…a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools.” • Gartner defines Big Data with three ‘V’s: • Volume • Velocity (of production and analysis) • Variety • Recently the forth ‘V’ – Veracity – was added • This means that Big Data are beyond our control (as opposed to those complex and big systems with diverse and changing data where the complexity is known)

  19. Big Data Solution – Agility • Dimensions of agility • Storage paradigms that accommodate massive volumes of heterogeneous data • Data processing paradigms that can deal with the massive volumes of heterogeneous data coming onstream • Dynamic data stores that can easily accommodate diverse and a priori unknown data types and semantics • Methods and tools that leverage dynamic and diverse content

  20. The Problem of Horizontal Integration of Big Intelligence Data • HI =Def. the ability to exploit multiple data sources as if they are one • Recognized issues for HI with existing approaches • Data silos • Lexicon/semantics silos • Requirement for HI of Big Intelligence Data – Agile Semantic Interoperability • A strategy for HI must be agilein the sense that it can be quickly extended to new zones of emerging data according to need • Ontology allows an incremental approach – big bang already from the very first buck (we showed on the project that is described below) • Ontology can provide the needed agility

  21. Agile Semantic Interoperability • A good solution has to be • Able to grow incrementally • Able to be developed in a distributed manner • Without losing consistency • Independent of particular implementations, and data producers and consumers • Applicable to data in an agile manner • We call our solution: ‘semantic enhancement’ (SE) of data

  22. Explication vs. Annotation • Explica­tionof general terms used in source intelligence artifacts and in data models, terminologies and doctrinal publications which provide typo­logies of intelligence-related IAs to semantically enhance data in a way that enables computational integration and reasoning • Annotation of the instance-level information captured by such IAs to aid retrieval of information about specific persons, groups, events, documents, images, and so forth

  23. SE • SE is realized with the help of ontologies that are used to explicate data models and annotate data instances • Vocabulary of ontologies used for explications and annotations provides agile horizontal integration • Ontologies, by virtue of their nature and organization, provide semantic enhancement of data Education Skill Technical Education ComputerSkill ProgrammingSkill SQL Java C++

  24. The Meaning of ‘Enhancement’ • Semantic enhancement/enrichment of data = arm’s length approach (no change to data) – through simple explication we associate an entire knowledge system with a database field • enables analytics to process data, e.g. about computer skills, “vertically” along the Skill hierarchy, as well as “horizontally” via relations between Skill and Education. • and further… while data in the database does not change, its analysis can be richer and richer as our understanding of the reality changes • For this richness to be leveraged by different communities, persons, and applications it needs to have the properties mentioned above and be constructed in accordance with the principles of the SE

  25. SE Principles • Create a Shared Semantic Resource (SSR) of ontologies to be used for explication and annotation • Establish an agile strategy for building ontologies within this SSR, and apply and extend these ontologies to explicate and annotate new source data as they come onstream • Problem: Given the immense and growing variety of data sources, the development methodology must be applied by multiple different groups • How to manage collaboration?

  26. Achieving the Goal • Methodology of incremental distributed ontology development • A common ontology architecture incorporating a common, domain-neutral, upper-level ontology (BFO) • A shared governance and change management process • A simple, repeatable process for ontology development • An ontology registry • A process of intelligence data capture through explication or source data models

  27. Main Methodological Points • Ontological realism • Based on Doctrine / Science • Involves SMEs in label selection and definition • Thoroughly tested in many projects • Arms-length process, with minimal disturbance to existing data and data semantics • Reference ontologies– capture generic content and are designed for aggressive reuse in multiple different types of context: Single reference ontology for each domain of interest • Application ontologies– are tied to specific local applications • An application ontology is created by combining local content with generic content taken from relevant reference ontologies • Still interoperable because based on common set of reference ontologies * Barry Smith and Werner Ceusters, “Ontological Realism as a Methodology for Coordinated Evolution of Scientific Ontologies”, Applied Ontology, 5 (2010), 139–188.

  28. Arms-length Process • Focusing on the terms (labels, acronyms, codes) used in ***our source data • Where multiple distinct terms {t1, …, tn} are used in separate data sources with one and the same meaning, they are associated with a single preferred labeldrawn from a standard set of such labels • All the separate data items associated with the {t1, … tn} thereby linked together through the corresponding preferred labels. • Preferred labels form basis the for the ontologies we build SE ontology labels XYZ Heterogeneous Contents KLM ABC

  29. Reference and Application Ontologies Reference Ontology Application Definitions • vehicle =def: an object used for transporting people or goods • tractor =def: a vehicle that is used for towing • crane =def: a vehicle that is used for lifting and moving heavy objects • vehicle platform=def: means of providing mobility to a vehicle • wheeled platform=def: a vehicle platform that provides mobility through the use of wheels • tracked platform=def: a vehicle platform that provides mobility through the use of continuous tracks artillery vehicle = def. vehicle designed for the transport of one or more artillery weapons wheeled tractor = def. a tractor that has a wheeled platform tracked tractor = def. a tractor that has a tracked platform artillery tractor = def. an artillery vehicle that is a tractor wheeled artillery tractor = def. an artillery tractor that has a wheeled platform

  30. Illustration of Ontology Types (Toy Example) Vehicle Black – reference ontologies Red – application ontologies Artillery Vehicle Tractor Wheeled Tractor Artillery Tractor Wheeled Artillery Tractor

  31. Role of Reference Ontologies • Normalized • Maintains a set of consistent ontologies • Eliminates redundancy • Modular • A set of plug-and-play ontology modules • Enables distributed consistent development • Surveyable

  32. SE Architecture • The Upper Level Ontology (ULO) in the SE hierarchy must be maximally general(no overlap with domain ontologies) • The Mid-Level Ontologies (MLOs) introduce successively less general and more detailed representations of types which arise in successively narrower domains until we reach the Lowest Level Ontologies (LLOs). • The LLOs are maximally specific representation of the entities in a particular one-dimensional domain

  33. Challenges to HI • Too many lexicons • The scope of the domain: signal, sensor, image, … intelligence about … the whole world • Difficult to conduct governance and management of ontology development to ensure consistent evolution • Lack of expertise • Complexity of the ontology development and application process

  34. Preventing Failure • The method we use offers solutions to some of the common reasons for failure • Lack of Consensus • Realism offers an objective standard for settling disputes over terminology. Ontology development becomes an empirical science instead of an exercise in the publication of dialects • Governance helps to resolve conflicts and achieve consensus • High Maintenance • Arm’s length implementation places no additional overhead onto applications • Parochialism • Architecture and methodology prevent development of vocabularies that apply only to a single perspective • Poor Quality • Experience prevents common mistakes in vocabularies that cause downstream problems with search and analytics

  35. Preventing Failure (cont.) • Agile ontology development • Methodology and architecture • Growing SSR • Agile ontology application • Incremental • Semi-automated where possible • Even if not as fast as some want it to be • It is still faster than creating a physical store, which will be just another silo and will still need to be integrated with the rest of data • Once a data collection is semantically enhanced, it is integrated with all data that had been and will be semantically enhanced without any additional efforts

  36. What is Next… • IAO-Intel: An Information Artifact Ontology for the Intelligence Community (BS) • A Survey of DSGS-A Ontology Work and Explicating and Annotating Processes (R. Rudnicki) • Email Ontology – illustration of the methodology of ontology design and of the IAO-Intel (D. Salmen and W. Mandrick)

  37. References • Barry Smith, Tatiana Malyuta, William S. Mandrick, Chia Fu, Kesny Parent, Milan Patel, Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community, STIDS Conference, 2012. • Barry Smith, Tatiana Malyuta, David Salmen, William Mandrick, Kesny Parent, ShouvikBardhan, Jamie Johnson, “Ontology for the Intelligence Analyst”, Crosstalk: The Journal of Defense Software Engineering, 2012. • David Salmen, Tatiana Malyuta, Alan Hansen, Shaun Cronen, Barry Smith, Integration of Intelligence Data through Semantic Enhancement, STIDS Conference, 2011.

More Related