1 / 40

Information Ontologies for the Intelligence Communities A Survey of DCGS-A Ontology Work

Information Ontologies for the Intelligence Communities A Survey of DCGS-A Ontology Work. Ron Rudnicki November 12, 2013. Topics. The DCGS-A ontology suite Standard operating procedures and ontology quality assurance Annotation vs. Explication

duc
Download Presentation

Information Ontologies for the Intelligence Communities A Survey of DCGS-A Ontology Work

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Ontologies for the Intelligence CommunitiesA Survey of DCGS-A Ontology Work Ron Rudnicki November 12, 2013

  2. Topics • The DCGS-A ontology suite • Standard operating procedures and ontology quality assurance • Annotation vs. Explication • How the DCGS-A ontologies are being used for the explication of data models

  3. The DCGS-A Ontology Suite

  4. Motives for Ontology Development Part of a Big Data solution • Multiple formats including free text, semi-structured and structured • Some “surprise” data sets are made available a short time prior to system testing • Data sets will change along with domain of interest • Data can not be collected into a single store • Provide cross-source searching and analytics • Need to maintain the provenance of data

  5. Contribution of the Ontologies Design choices affect the outcome • Common Upper Level Ontology – The ontologies extend from a common upper level ontology • Delineated Content - Each ontology has a clearly specified and delineated content that does not overlap with any other ontology • Composable Content – Classes in the ontologies represent entities at a level of granularity that can be composed in various ways to map to terms in sources

  6. Integration Through a Common Upper Level Ontology Encourages uniform representations of domains • Provides common patterns within the target ontology for mappings from the sources • Easier to include new sources of data • Enables more uniformity between queries • Easier to transition to domains of interest Entity Object Quality bearer_of Organization Physical Artifact Quality of Organization Quality of Physical Artifact has_quality has_quality CUBRC - Proprietary

  7. Integration Through Delineated Content Each class in the target ontologies is defined in one place • Facilitates locating a class within the target ontologies • Provides better recall in queries • Less likely to overlook relevant data Entity Object Organization Physical Artifact Spatial Location located_at located_at CUBRC - Proprietary

  8. Integration Through Composition of Classes Data Source 2 • Granular classes better accommodate mappings from various perspectives on the same domain without loss of information Data Source 1 prescribes Model Car has quality manufactures Full Size Length of Wheelbase Manufacturer Mid Size is nominally measured by Compact CUBRC - Proprietary

  9. High Level Depiction of Domain Provides Coverage of Domain of Human Activity Actions to perform People & Organizations Artifacts that take place in use Natural & Artificial Environments are distinguished by Time Attributes

  10. Developed Using a Top-Down Bottom-Up Strategy Partial List of Data Sources Used • Treasury Office of Foreign Assets Control – Specially Designated Nationals and Blocked Persons • NCTC – Worldwide Incidents Tracking System • UMD – Global Terrorism Database • RAND – Database of Worldwide Terrorism Incidents • LDM version .60 (TED) • VMF PLI • DCGS-A Global Graph • DCGS-A Event Reporting • BFT Report (CCRi test data) • CidneSigact (CCRi test data) • Long War Journal • Harmony Documents from CTC at West Point • Threats Open Source Intelligence Gateway

  11. Based Upon Standards Partial List of Doctrine and Standards Used • DOD Dictionary of Military and Associated Terms (JP 1-02) • JC3IEDM • Counterinsurgency (FM 3-24) • Operations (FM 3-0) • Multinational Operations (JP 3-16) • International Standard Industrial Classification of all Economic Activities Rev.4 (ISIC4) • Universal Joint Task List (CJSCM 3500.04C) • Weapon Technical Intelligence (WTI) Improvised Explosive Device IED Lexicon • Information Artifact Ontology (IAO) • Phenotype and Trait Ontology (PATO) • Foundational Model of Anatomy (FMA) • Regional Connection Calculus (RCC-8) • Allen Time Calculus • Wikipedia

  12. Current DCGS-A Ontology Architecture

  13. Ontology Metrics

  14. Standard Operating Procedures and Ontology Quality Assurance

  15. Semantic Conformance Testing Semantic Smuggling • An importing ontology reuses a term from another and adds to its content in some way • adds an axiom to some upper-level term. • the imported class inherits content from parent classes of the importing ontology • Corrective action • request that the curators of the ontology that is the source of the class add the content • If not possible, then plan for revision of import architecture • the importing ontology should introduce a subtype of the term to which the content could then be added.

  16. Semantic Conformance Testing Multiple Inheritance • Defining a class to be a subtype of more than one superclass • Corrective action • remove any subclass assertions that are false (e.g. Bank subClassOf Organization, Bank subClassOf Facility) • refactorsuperclasses into disjoint classes • write axiom so that the multiple inheritance exists in the inferred hierarchy rather than the asserted hierarchy

  17. Semantic Conformance Testing Taxonomy Overloading • Extending an ontology by introducing terms as child terms of a higher-level ontology using another relation (e.g. part of, is narrower in meaning than) • Corrective action • Place the terms into their appropriate place in the taxonomy

  18. Semantic Conformance Testing Containment • a term from a lower level is not a subclass of any class of the ontologies it imports • containment requires that the domain covered by a lower-level ontology be circumscribed by the domain covered by the higher-level ontology from which it extends. • Corrective action • Add the class (or an appropriate superclass) to the appropriate higher-level ontology • Import a higher-level ontology that does provide a superclass

  19. Semantic Conformance Testing Conflation • an ontology includes information model assertions that are not true of the domain • e.g. carrying over a not null constraint as in every person must have an email address • Corrective action • Make needed modifications to axiom (generally the source of such violations) so that it conforms to the domain • e.g. every person that has purchased from amazon.com must have an email address

  20. Semantic Conformance Testing Logic of Terms • a class is a set-theoretic combination of other classes • Corrective action • Add the class as a new type (College or University => Higher Education Organization)

  21. Calculating Value of Ontology Terms Provide some basis for class inclusion/exclusion • The content of ontologies used in an enterprise will be the subject of debate and possibly, disagreement • Having one or more metrics that are proven measures of value would help resolve such disagreements • Current methods are often applied to ontologies in their entirety (e.g. Swoogle), fewer are designed to evaluate value of ontology classes and properties

  22. Calculating Value of Ontology Terms Statistical Methods Supplemented by Weightings • A purely statistical method applied to an ontology as a graph will undervalue isolated terms that are of importance in a domain • Importance, is at least a function of amount of use and criticality • Usage is tractable to definition, criticality less so

  23. Annotation vs. Explication

  24. Mappings Value and Assessment • Many of the purposes for which ontologies are built will be realized only to the degree to which they are linked to data • One component of mapping is an act of translation and should be assessed on the degree of equivalence between source and target • Another component of mapping is an implementation and should be assessed on performance criteria such as costs and scalability • Techniques and technologies vary* *An introductory overview can be found at: http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport_01082009.pdf

  25. Mappings Subtypes • Hashtags – the subjective assignment of uncurated keywords to a source • Annotations – rule based assignment of curated terms to a source • Machine maps – automated, structure-based translation of source into target vocabulary • Definitions – rule based expansion of source terms into types and differentiating attributes • Explications – rule based translation of all semantic content (including that which is implicit) of a source using terms and relations of the ontology Term Mappings Assertion Mappings

  26. Mappings Pros and Cons • Term mappings • Can be automated • Enable faceted queries (Select “JFK” as type Airport) • Can result in significant loss of information • Not reuseable • Assertion mappings • Manual process that does not scale • Requires extensive knowledge of the target ontology • Enables navigational queries • Improves integration of data sources • Can result in significant carry over of source information • Not reuseable

  27. Assessing Current Mapping Methods • No Ideal Instances… Low Hashtags Annotations Machine Maps Definitions Time/Money Explications High Lossy Translation Lossless

  28. Examples of Mappings • A Source of Data About Cities

  29. Explication of the Source as an End Point Coordinates City Name City designated_by designated_by part_of Area delimits State has_quality City Government designated_by State Name participates_in Act Of Incorporation occurs_on Incorporation Date

  30. Explication Implementation Example A Portion of a D2RQ File Mapping Birth Place and Date map:PersonBirth rdf:type d2rq:ClassMap ; rdfs:label "Person Birth" ; d2rq:class event:Birth ; d2rq:classDefinitionLabel "Treasury OFAC Person Birth" ; d2rq:dataStorage map:KDD-02-B-Treasury-SDN ; d2rq:uriPattern "treasurydata_PersonBirth/@@TreasuryPerson.id|urlify@@" . map:PersonBirthTemporalInterval rdf:type d2rq:ClassMap ; rdfs:label "Person Birth Temporal Interval" ; d2rq:class span:TemporalRegion ; d2rq:classDefinitionLabel "Treasury OFAC Person Birth Temporal Interval" ; d2rq:dataStorage map:KDD-02-B-Treasury-SDN ; d2rq:uriPattern "treasurydata_PersonBirthTemporalIdentifier/@@TreasuryPerson.id|urlify@@_@@TreasuryPerson.dateofbirthlist_uid|urlify@@" . map:PersonBirthTemporalIntervalIdentifier rdf:type d2rq:ClassMap ; rdfs:label "Person Birth Temporal Interval Identifier" ; d2rq:class airs:TemporalRegionIdentifier ; d2rq:classDefinitionLabel "Treasury OFAC Person Birth Temporal Interval Identifier" ; d2rq:dataStorage map:KDD-02-B-Treasury-SDN ; d2rq:uriPattern "treasurydata_PersonBirthTemporalIdentifier/@@TreasuryPerson.id|urlify@@_@@TreasuryPerson.dateofbirthlist_uid|urlify@@" . map:PersonBirthTemporalIntervalIdentifierBearer rdf:type d2rq:ClassMap ; rdfs:label "Person Birth Temporal Interval Identifier Bearer" ; d2rq:class airs:TemporalRegionIdentifierBearer ; d2rq:classDefinitionLabel "Treasury OFAC Person Birth Temporal Interval Identifier Bearer" ; d2rq:dataStorage map:KDD-02-B-Treasury-SDN ; d2rq:uriPattern "treasurydata_PersonBirthTemporalIdentifierBearer/@@TreasuryPerson.id|urlify@@_@@TreasuryPerson.dateofbirthlist_uid|urlify@@" . map:PersonBirthGeospatialLocation rdf:type d2rq:ClassMap ; rdfs:label "Person Birth Geospatial Location" ; d2rq:class geo:GeospatialLocation ; d2rq:classDefinitionLabel "Treasury OFAC Person Birth Geospatial Location" ; d2rq:dataStorage map:KDD-02-B-Treasury-SDN ; d2rq:uriPattern "treasurydata_PersonBirthGeospatialLocation/@@TreasuryPerson.id|urlify@@_@@TreasuryPerson.placeofbirthlist_uid|urlify@@" .

  31. Explication Current Method • The full mapping of birth place and date consists of 16 such blocks • The full mapping of the entire table consists of 150 such blocks • If the ontologies change, so must the mappings • Common patterns in the ontologies make some re-use possible by adding placeholders to portions of maps and replacing them with specific values for the source at hand. • Applications exist or are under development to auto-generate initial mappings that a human can then edit

  32. Explication Current Method • The improvements are source and implementation specific • What works for structured sources mapped in D2RQ can’t be reused in structured sources mapped in other languages (R2RML, EDOAL) • Separate mappings would be needed for sources expressed in XML, HTML or free text • Another solution is needed

  33. How the DCGS-A Ontologies are Being Used for the Explications of Data Models

  34. Start with Machine Made Assertion Mappings • Type to Type mapping (e.g. table column to class) • Relationships between types expressed using a default generic object property • Meta-data about the source entity (e.g. table name, column name, element name) is mapped to annotation properties (rdfs:label)

  35. Machine Made Assertion Mapping as a Starting Point Name Coordinates has_coordinates has_name City has_area Area has_state State has_incorporation_date Class mappings created by associating the container with the components with a generic property Incorporation Date

  36. Current Content of Ontologies is not Well Used • Ontologists are trained to associate subclass and equivalence axioms to classes • OWL reasoners don’t expand the graph by creating instances based upon these axioms • OWL reasoners are resource expensive and often result in unimpressive output • Not much control can be exerted upon which inferences an OWL reasoner performs

  37. Create a Library of Rules Change the relationship and type of the name of a city CONSTRUCT {?city ex:designated_by ?cityname . ?citynamerdf:typeex:CityName .} WHERE { ?city rdf:typeex:City . ?citynamerdf:typeex:Name . ?city ?related_to ?cityname . NOT EXISTS {?city ex:designated_by ?cityname . } }

  38. Create a Library of Rules Delete the original relationship and type DELETE {?city ?related_to ?name . ?name rdf:typeex:Name . } WHERE { ?city ?related_to ?name . ?name rdf:typeex:Name . ?city ex:designated_by ?name . ?name rdf:typeex:CityName . }

  39. The Affect of Such Rules on Translated Data city_name_1 coordinates _1 has_text_value Tampa designated_by has_value designated_by 27 56’50”N 82 27’31”W city_1 part_of area_1 state_1 has_quality designated_by has_value state_name_1 delimits 170.6 sq. mi. has_value city_government_1 Florida is_output_of act_of_incorporation has_incorporation_date act_of _incorporation_1 has_value occurs_on July 15, 1887 act_of _incorporation_2

  40. Benefits of Rule Library • No need to write different rules for different source formats • Changes to the ontology affect a single rule rather than some (possibly large) number of mappings • Allows mappings from source to target to be simple and possibly fully automated • Writing of rules can be performed by SMEs • Fine grained control of which rules are executed • by user group • above a stated level of priority (weighting)

More Related