1 / 26

Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision Fusion . Project Report

Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision Fusion . Project Report. Vladimir Gorodetski, Oleg Karsaev, Vladimir Samoilov Intelligent System Laboratory of the St. Petersburg Institute for Informatics and Automation

aira
Download Presentation

Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision Fusion . Project Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision Fusion.Project Report Vladimir Gorodetski, Oleg Karsaev, Vladimir Samoilov Intelligent System Laboratory of the St. Petersburg Institute for Informatics and Automation E-mail: {gor, ok, samovl}@mail.iias.spb.su http://space.iias.spb.su/ai/english/gorodetski.htm

  2. Title of the Project “Autonomous Information Collection, Knowledge Discovery Techniques and Software Tool Prototype for Knowledge-Based Data Fusion” Project from European Office of Aerospace Research and Development (EOARD) –AFRL/IF (USA) (December 2000 - December 2003)

  3. Outline of the Project Presentation 1. Outline of the Data and Information Fusion problems 2. Project research objectives 3. Examples of case studies and applications used 4. Ontology-centered meta-model of data sources 5. Meta-model of decision fusion 6. Multi-agent architecture 7. Conclusion

  4. Tasks and Applications of Data and Information Fusion Application Fields Critical areas of human society security, life support, security of critical state infrastructures, large-scale logistics, natural and man-made disasters, etc. Examples of Applications • Assessment and prediction of situations, • Resource management and rescue operation planning in large scale natural and man-made disasters, • Decision making and planning of rescue operations in systems like US 911, Situational awareness and prediction for terrorist intents and anti-terrorist activity planning, • Military situation assessment, • Safeguard of critical plants like nuclear power stations, electrical power grids, etc.

  5. Data Base Management System Support DB Fusion DB Information Fusion-Definition “…data fusion is a formal framework in which means and tools for the alliance of data originating from different sources are expressed. It aims at obtaining information of greater quality; the exact definition of “greater quality” will depend on the application” (JDL-Joint Directors of Laboratories model, USAF) Level 0-Pre-processing of sensor data Level 5-User refinement Sensor 1 Level 1-Object assessment Sensor 2 Level 2- Situation assessment Distributed data sources … Level 3- Impact assessment Sensor N Human-Computer interface Level 4-Process refinement Areas of the current and Future research projects are yellowed Distributed information sources Sensor management, resource management (Erik Blash, Fusion-2002, July, 2002, Annapolis, USA)

  6. Project Research Objectives Development ofDF software toolproviding support for design (first of all, for learning!) and implementation of DF applications of broad spectrum, in particular, providing support for : • Development of ontology-based meta-models of data sources, meta-model of decision fusion and conceptual model of DF software tool, • Development of Multi-agent architecture and • Design and implementation of applications of broad spectrum.

  7. Examples of case studies and application used in Projects Case studies -KDD Cup99 dataset -- Preprocessed relational data specifying Intrusion Detection task http://kdd.ics.uci.edu/databases/kddcup99.html -Landsat Multi-Spectral Scanner image dataset http://www.dfc-grss.org/data/grss_dfc_0010.zip -STULONG dataset– Longitudinal Study of Atherosclerosis Risk Factors http://euromise.vse.cz/challenge/en/projekt/index.php Applicationto be used in debugging and validation of MAS DK-DF - Intrusion detection learning system (Project also funded by EOARD/AFRL)

  8. Subtasks of the Project matching Semantic Web Mining area 1. Design and implementation of meta-model of data sources caused by heterogeneity and distribution of data to be fused. 2. Design and implementation of meta-model of distributed learning.

  9. Host-based sources SPP - Statistical processing program Log of commands run by users plus resource System program 3 SPP statistical data set 1 System program 2 Log of all login failures SPP statistical data set 2 System program 1 Log of all user logins/logouts and system startups and shutdowns SPP statistical data set 3 Auditing subsystem of OS Filtered OS audit trail SPP OS audit trail statistical data Mail log Mail service SPP Mail statistical data DNS service DNS log Telnet log SPP SPP DNS statistical data Telnet service Telnet statistical data IP Header ICMP Header HTTP service HTTP log FTP log FTP service SPP SPP HTTP statistical data FTP statistical data Network-based sources Tcpdump statistical data SPP Tcpdump TCPDUMP (WINDUMP) UDP/TCP Header IP Header FTP Data … … … … … HTTP Data SMTP Data DNS Data TELNET Data Network Packet Network Packet Network Packet Network Packet Network Packet Network Packet Network Traffic Multiplicity of Data Sources Presenting User’s Activity in Intrusion Detection system

  10. Interrelation of Semantic Web and Ontology-oriented Research within the Project Semantic Web considers development and standardization of the ontology specification languages (XML, RDF, DAML+OIL), ontology-based query languages, ontology editors, etc). Semantic Web Mining considers specific problems of ontology design technology for (Web-based) Data Mining systems. Any DF system technology supposes (Web-based) distributed Data Mining and KDD and that is why it is a sub-area of the Semantic Web Mining. Ontology-based Data and Information Fusion system design put a number of specific problems of technological sort. Among them, the most important one is a technology for distributed design of distributed ontology.

  11. Meta-data manager Data Source management agent Data Source management agent Data Source Manager Data Source Manager Ontology-based meta-model of Data sources Data Source Manager Data Source Manager Data Source management agent Data Source management agent “KDD Master” Agent Data Source Data Source Data Source Data Source Sensor Sensor Sensor Sensor What is distributed design of distributed ontology? Data Sources Meta-model ……. Meta-model =Ontology + Data source models at meta-level supporting a unified view of data of particular sources

  12. DF Problem ontology Shared component of Application ontology … Private component of application ontology of data source 2 Private component of application ontology of data source 1 Private component of application ontology of data source k Tower of DF application ontology components DF system ontology

  13. Data Source k Data Source 2 Data Source 1 Data Source 3 DS- 3 management agent DS- 2 management agent DS- k management agent DS- 1 management agent KDD agent of source 3 KDD agent of source 1 KDD agent of source 2 KDD agent of source k Problem and shared components of application ontology Shared component of application ontology Shared component of application ontology Shared component of application ontology Shared component of application ontology Private component of application ontology-3 Private component of application ontology-k Private component of application ontology-3 Private component of application ontology-k Distributed Ontology and Protocols for Distributed Ontology Design Meta-level KDD Agent Protocols, Functions Protocols, Functions Agent 1 Agent k ……. Agent 2 Agent 3 “KDD Master” Agent Protocols, Functions Protocols, Functions Protocols, Functions

  14. Particular Tasks to Be Solved on the Basis of Meta-model of Data Sources • Providing for monosemantic understanding of terminology used in data specification by distributed analysts; • Solution of the entity identification problem; • Providing consistency of data representation (in case if the same attributes are presented differently in different data sources); • Providing a gateway between ontology and distributed databases accessibility making possible interaction between ontology and distributed databases, and several other tasks.

  15. Meta-model of Data Sources: Ontology + Protocols=>Monosemantic understanding of terminology Monosemantic understanding of terminology among DF system components is provided by shared vocabulary used by DF system distributed entities for communication. This excludes different naming of the same entities and their properties in different sources, and equal naming of different entities within different data sources thus providing integrity and consistency of shared vocabulary. Protocols Supports distributed collaborative design of coherent ontology by distributed analysts.

  16. Example of Application Ontology: High-level Part of Intrusion Detection Domain Ontology A Network attack Reconnaissance R ABE CI Applications and Banners Enumeration Collection of Information Implantation and threat realization IS UE Identification of services Users and Groups Enumeration IO I Creating Back Doors Identification of OS IH Resource Enumeration RE Getting Access to Resources CBD Identification of hosts SPIH CT DC Network Ping Sweeps Port Scanning Proxy scanning Covering Tracks GAR GAD ER ST Gaining AdditionalData PS TCP connect scan Escalating Privilege Dumb host scan TR SS Threat Realization Notions of micro-layer DHS TCP SYN scan DOS ID SFB CD SF Scanning 'FTPBounce' Denial of Service Confidentiality destruction Integrity destruction TCP FIN scan SN SX TCP Null scan HS "Part of" relationship SU TCP XmasTree scan “Subclass of" relationship Half scan UDP scan N o t i o n s o f l o w e r l e v e l s

  17. Source 1. Local source expert Source 1: Data preparation agent Source N: Local source expert Source N: Data preparation agent Meta-data description agent Application domain expert Forming the basic variant of ontology Sending the basic variant Analysis of the suggested basic variant Sending the basic variant Analysis of the suggested basic variant Modifying and expanding the ontology Synchronization of modifications by the basic protocol Modifying and expanding the ontology Synchronization of modifications by the basic protocol The Simplest ("top-down") Meta-protocol for Collaborative Ontology Design

  18. 3 4 5 6 7 8 9 2 1 Current state reading Request for required ontology descriptions Unconfirmed changes buffer query Forming the current representation of ontology Representation of current state of ontology Recording the changes Changes of ontology Sending current changes to the shared ontology Forming the current representation of ontology Representation of current state of ontology Periodic request for suggested changes Verification of changes Confirmation/rejection of suggested changes Introducing changes Introducing of changes Adding changes to ontology Deletion of verified changes Deletion of verified changes Ontology Synchronization Protocol Represented in Terms of UML-sequence Diagram Legend: 1. Local source expert 2. Local source data managing agent 3. Local source ontology 4. Local source: buffer of temporary changes 5. KDD master (Meta- data description agent) 6. Shared ontology 7. Meta-level agent: buffer of temporary changes 8. Application expert (meta-level) 9. Local source determining the modified ontology part

  19. Meta-model of Data Sources: Entity Identification Problem Explanation of Entity Identification Problem Data Source 3 Data Source 2 Data Source 1

  20. Host-based sources statistical data on Connection 1 Log of commands run by users plus resource SPP System program 3 statistical data on Connection N System program 2 statistical data on Connection 1 SPP Log of all user logins/logouts and system startups and shutdowns System program 1 statistical data on Connection N Auditing subsystem of OS Filtered OS audit trail OS audit trail statistical data on Connection 1 SPP OS audit trail statistical data on Connection N Mail log Case 1 Mail service SPP Mail statistical data on Connection 1 Case N FTP log FTP service FTP statistical data on Connection N SPP Network-based sources Tcpdump Tcpdump statistical data on Connection 1 SPP … TCPDUMP (WINDUMP) Tcpdump statistical data on Connection N … … … … … Connection 1 ………………………………………………………………………………………… TCP Hdr (ACK) TCP Hdr (SYN) TCP Hdr (FIN) TCP Hdr (FIN) TCP Hdr (ACK) TCP Hdr (SYN) … … IP Hdr IP Hdr IP Hdr IP Hdr IP Hdr IP Hdr … … … IP Hdr IP Hdr TCP Hdr TCP Hdr SMTP Data FTP Data SMTP Data FTP Data SMTP Data FTP Data SMTP Data FTP Data Connection N Demonstration of Entity Identification Problem: Intrusion Detection Application

  21. A Technique for Entity Identification Problem • In the DF problem ontology, for each instance of an object to be classified, the notion of entity identifier ("ID entity") is introduced. This entity identifier plays the role of the primary key of the instance (in analogy with the primary key of a table). • For each such identifier, a rule as a component of the shared part of application ontology is defined, which can be used to calculate the value of the instancekey. A rule is a function which arguments are chosen from the set of this entity attributes. A rule is defined for each local data source to uniquely connect the entity identifier and the local primary key in this source. This rule specifies: • how to derive the local primary keyof instance from the entity identifier value; • how to derive the entity identifier value from the value of the local primary key of an instance of the source.

  22. Meta-model of Data Sources: Diversity of Measurement Scales of the Same Attributes in Different Data Sources Let X be an attribute in application ontology that is measured differently in different sources. • In the shared component of application ontology, the type and the measurement unit of the attribute X are determined. Selection of attribute X specification within shared part of application ontology is made by experts during negotiations according to a synchronization protocol. • In all the sources where X is present, expressions are determined for this attribute, through which it can further be convertedinto the same scale in all the sources. This allows using the values of attributes on the meta-level regardless of the data source from which they originated.

  23. Meta-model of Data Sources: Interaction of Ontology and Databases of Sources The task arises due to the fact that application ontology entities are specified in terms of ontology notions but their instances are represented in terms of database language. To provide interaction of ontology and databases of sources(accessibility of data requested in ontology terms) , a special gateway is developed. Application DF problem ontology DF application ontology Client-gateway DF problem ontology DF Application ontology Local source data properties Access viaVIEWobjects Database objects Local data source Three-level hierarchy of access to the database objects

  24. Meta-model of Distributed Learning Components of meta-model of distributed learning: • Meta-model of decision making and combining decisions of multiple base-level classifiers; • Model of distributed data management (allocation training and testing data sets for learning particular classifiers; management by computation of meta-data for upper level example-based learning, etc.); • Approaches and formal techniques used for combining decisions.

  25. Conclusion: Future work . 1. Development of sophisticated ontology editor supporting distributed design of a distributed ontology. 2. Further design and Implementation of Data Fusion System software tool for development and implementation of particular distributed applications in Data Fusion area.

  26. Thank you! For more information and related publications please contact E-mail: gor@mail.iias.spb.su http://space.iias.spb.su/ai/english/gorodetski.htm Acknowledgement This research is funded by AFRL/IF (EOARD), 1999-2003

More Related