An ontological approach to the document access problem of insider threat
1 / 27

An Ontological Approach to the Document Access Problem of Insider Threat - PowerPoint PPT Presentation

  • Uploaded on

An Ontological Approach to the Document Access Problem of Insider Threat. ISI 2005 , (May 20). Boanerges Aleman-Meza 1 Phillip Burns 2 Matthew Eavenson 1 Devanand Palaniswami 1 Amit P. Sheth 1. (1) LSDIS Lab, Computer Science Dept., University of Georgia, USA

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'An Ontological Approach to the Document Access Problem of Insider Threat' - knox

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
An ontological approach to the document access problem of insider threat

An Ontological Approach to the Document Access Problem ofInsider Threat

ISI 2005, (May 20)

Boanerges Aleman-Meza1

Phillip Burns2

Matthew Eavenson1

Devanand Palaniswami1

Amit P. Sheth1

(1) LSDIS Lab, Computer Science Dept.,

University of Georgia, USA

(2) CTA – Computer Technology Associates


Objective approach
Objective & Approach

  • Determine if (classified) documents reviewed an IC analyst satisfy his/her “need to know”

    • Characterization of “need to know” w.r.t. ontology

    • Characterizing document content in terms of ontology

    • Discovering weighted semantic relationships between document content and “need to know”

Characterizing need to know using an a semantic approach using ontology
Characterizing “Need to Know” using an a Semantic Approach (using Ontology)

  • Requires domain ontology

    • models important concepts & relationships of domain (schema), captures factual knowledge (instances)

  • Relate analyst’s need to know to concepts & relationships in ontology

    • e.g. terrorist organization, funding sources, facilitators, members, methods

Need to know context of investigation
“Need to know” = context of investigation

26,489 entities

34,513 (explicit) relationships

Add relationship to context

Characterizing document content in terms of ontology semantic annotation
Characterizing document content in terms of ontology “Semantic Annotation”

  • Correlate words/phrases from document with entities/relationships in ontology

    • Entity identification

  • Meta-data added to document (from associated ontological knowledge)

  • Active area of research but practically useful technology now available

  • Constrained to content of ontology

  • Semantic relationships between document need to know
    Semantic Relationships between Document & “Need to Know” “Semantic Annotation”

    • Semantic associations: relationships between document concepts & “need to know” concepts are discovered and ranked

    • Ranking based on multiple factors

      • no. of links, types of links, location in ontology, …

    • Ranking indicates degree of semantic “closeness”

      • and therefore, how related document is to “need to know”

    Documents ranking
    Documents “Semantic Annotation”Ranking

    • Highly relevant

    • Closely related

    • Ambiguous

    • Not relevant

    • Undeterminable

    Research content
    Research Content “Semantic Annotation”

    • Discovery & Ranking of semantic semantic associations

    • Characterizing “need to know” in terms of ontological concepts & relationships

    • Meta-data annotation of data and (semi-structured & unstructured) documents

      • correlation of document content & concepts in ontology

    Research challenges
    Research Challenges “Semantic Annotation”

    In this project we are addressing:

    • Discovery of Semantic Associations per entity per document

    • Input/Visualization/Management of Context of Investigation

    • Scalability on number of documents & ontology size

      • Performs well with thousand documents

    • Ranking of documents

    Ranking of documents relevance
    Ranking of Documents Relevance “Semantic Annotation”

    “Closely related entities are more relevant than distant entities”

    E = {e | e  Document }

    Ek = {f | distance(f, eE) = k }

    Components of document relevance

    1. “Semantic Annotation”

    Entities belong

    to classes in the


    type(entity)  Context

    Entities match a list of entities

    of interest (in the Context)

    entity Entities-List


    Components of Document Relevance




    Relationship  [Class]

    Context of Investigation

    (specific entities)

    • Abu Abdallah

    • Turkmenistan

    • Konduz Province

    Schematic of ontological approach to the legitimate access problem

    Semagix Freedom “Semantic Annotation”

    Semagix Freedom

    Schematic of Ontological Approach to the Legitimate Access Problem

    Conclusions “Semantic Annotation”

    • New Semantic Approach to the challenging problem

    • Viability demonstrated on a small scale

    • Significant new research that builds upon the latest Semantic Platform

    • Many applications of this approach: vendor vetting, knowledge discovery, ….

    Acknowledgements “Semantic Annotation”

    • Semagix provided technology to populate ontology using knowledge extraction, and (semi-)automatic metadata extraction from documents (Freedom toolkit).

    • NSF-funded projects provided core research: "Semantic Association Identification and Knowledge Discovery for National Security Applications" (Grant No. IIS-0219649) and "Semantic Discovery: Discovering Complex Relationships in Semantic Web" (Grant No. IIS-0325464)

    References “Semantic Annotation”

    • 1. B. Aleman-Meza, C. Halaschek, I.B. Arpinar, A. Sheth, Context-Aware Semantic Association

    • Ranking. Proceedings of Semantic Web and Databases Workshop, Berlin, September 7-

    • 8 2003, pp. 33-50

    • 2. B. Aleman-Meza, C. Halaschek, A. Sheth, I.B. Arpinar, and G. Sannapareddy. SWETO:

    • Large-Scale Semantic Web Test-bed. Proceedings of the 16th International Conference on

    • Software Engineering and Knowledge Engineering (SEKE2004): Workshop on Ontology in

    • Action, Banff, Canada, June 21-24, 2004, pp. 490-493

    • 3. R. Anderson and R. Brackney. Understanding the Insider Threat. Proceedings of a March

    • 2004 Workshop. Prepared for the Advanced Research and Development Activity (ARDA).


    • 4. K. Anyanwu and A. Sheth ρ-Queries: Enabling Querying for Semantic Associations on the

    • Semantic Web The Twelfth International World Wide Web Conference, Budapest, Hungary,

    • 2003, pp. 690-699

    • 5. K. Anyanwu, A. Maduko, A. Sheth, SemRank: Ranking Complex Relationship Search Results

    • on the Semantic Web, In Proceedings of the 14th International World Wide Web Conference,

    • Japan 2005 (accepted, to appear)

    • 6. K. Anyanwu, A. Maduko, A. Sheth, J. Miller. Top-k Path Query Evaluation in Semantic

    • Web Databases. (submitted for publication), 2005

    • 7. C. Halaschek, B. Aleman-Meza, I.B. Arpinar, A. Sheth Discovering and Ranking Semantic

    • Associations over a Large RDF Metabase Demonstration Paper, VLDB 2004, 30th International

    • Conference on Very Large Data Bases, Toronto, Canada, 30 August - 3 September,

    • 2004

    • 8. B. Hammond, A. Sheth, and K. Kochut, Semantic Enhancement Engine: A Modular Document

    • Enhancement Platform for Semantic Applications over Heterogeneous Content, in

    • Real World Semantic Web Applications, V. Kashyap and L. Shklar, Eds., IOS Press, December

    • 2002, pp. 29-49

    References cont
    References (cont) “Semantic Annotation”

    • 9. M. Rectenwald, K. Lee, Y. Seo, J.A. Giampapa, and K. Sycara. Proof of Concept System for

    • Automatically Determining Need-to-Know Access Privileges: Installation Notes and User

    • Guide. Technical Report CMU-RI-TR-04-56, Robotics Institute, Carnegie Mellon University,

    • October, 2004.


    • 04_3.pdf

    • 10. C. Rocha, D. Schwabe, M.P. Aragao. A Hybrid Approach for Searching in the Semantic

    • Web, In Proceedings of the 13th International World Wide Web, Conference, New York,

    • May 2004, pp. 374-383.

    • 11. M.A. Rodriguez, M.J. Egenhofer, Determining Semantic Similarity Among Entity Classes

    • from Different Ontologies, IEEE Transactions on Knowledge and Data Engineering 2003

    • 15(2):442-456

    • 12. A. Sheth, C. Bertram, D. Avant, B. Hammond, K. Kochut, and Y. Warke. Managing Semantic

    • Content for the Web. IEEE Internet Computing, 2002. 6(4):80-87

    • 13. A. Sheth, B. Aleman-Meza, I.B. Arpinar, C. Halaschek, C. Ramakrishnan, C. Bertram, Y.

    • Warke, D. Avant, F.S. Arpinar, K. Anyanwu, and K. Kochut. Semantic Association Identification

    • and Knowledge Discovery for National Security Applications. Journal of Database

    • Management, Jan-Mar 2005, 16 (1):33-53

    • 14. Boanerges Aleman-Meza, Phillip Burns, Matthew Eavenson,Devanand Palaniswami, Amit Sheth. An Ontological Approach to the Document Access Problem of Insider Threat

    Semantic annotation
    Semantic Annotation “Semantic Annotation”

    • Document searched for entity names (or synonyms) contained in ontology

    • Then document entities are annotated with additional information from corresponding entities in ontology including named relationships to other entities

    • Following chart is example

      • Highlighted text are entities found corresponding to concepts in ontology

      • XML is corresponding meta-data annotation

    Relevance measures for documents relating document content to ia need to know
    Relevance Measures for Documents “Semantic Annotation”(relating document content to IA “need to know”

    • Relevance engine input

      • the set of semantically annotated documents

      • the context of investigation for the assignment

      • the ontology schema represented in RDFS, and the ontology instances represented in RDF

    • Relevance measure function used to verify whether the entity annotations in the annotated document can be fit into the entity classes, entity instances, and/or keywords specified in the context of investigation.

    The big picture
    The Big Picture “Semantic Annotation”

    Sweto ontology schema visualization
    SWETO – Ontology Schema Visualization “Semantic Annotation”

    See SemDis project of LSDIS Lab, University of Georgia

    Relevance measures for documents relating document content to ia need to know cont
    Relevance Measures for Documents “Semantic Annotation”(relating document content to IA “need to know” (cont)

    • Documents classified as:

      • Highly relevant

        • Document entities directly related

      • Closely related

        • Document entities related through strong semantic associations

      • Ambiguous

        • Document entities related through weak semantic associations

      • Not relevant

        • Document entities not related to “need to know”

      • Undeterminable

        • Document entities not found in ontology

    Ia context of investigation characterization of need to know
    IA Context of Investigation “Semantic Annotation”(characterization of “Need to Know”)

    We define the context of investigation as a combination of the following:

    • A set of entity classes and relationships, and/or a negation of a set of entity classes and relationships

    • A set of entity instance names, and/or a negation of a set of entity instance names

    • A set of keyword values that might appear at any attribute of the populated instance data, and/or a negation of a set of keyword values

    Context of investigation cont
    Context of Investigation (cont) “Semantic Annotation”

    • Goal is to capture, at a high level, the types of entities, (or relationships), that are considered important.

    • Relationships can be constrained to be associated with specified class types

      • E.G. It can be specified that a relation ‘affiliated with’ is part of the context only when it is connected with an entity that belongs to a specific class, say, ‘Terror Organization’

    Ranking of documents relevance1
    Ranking of Documents Relevance “Semantic Annotation”

    Four groups of document-ranking:

    • Not Related Documents

      • unable to determine relation to context

  • Ambiguously Related Documents

    • some relationship exists to the context

  • Somehow Related Documents

    • Entities are closely related to the context

  • Highly Related Documents

    • Entities are a direct match to the context

      Cut-off values determine grouping of documents w.r.t. relevance

  • These are customizable cut-off values (more control and more meaningful parameters compared to say automatic classification or statistical approaches)

    “Inspection” of a document is possible via (a) original document or (b) original document with highlighted entities