1 / 38

Ontology Mapping Tool for Diabetes By Madhuri Gopal

Ontology Mapping Tool for Diabetes By Madhuri Gopal. Topics covered: Project overview Design Principles Technology Stack Approach and Methodology Execution Framework Modules Covered Results. Project Overview : Background The aim of the project is to overcome semantic

oistin
Download Presentation

Ontology Mapping Tool for Diabetes By Madhuri Gopal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology Mapping Tool for DiabetesBy Madhuri Gopal

  2. Topics covered: • Project overview • Design Principles • Technology Stack • Approach and Methodology • Execution Framework • Modules Covered • Results

  3. Project Overview: Background • The aim of the project is to overcome semantic heterogeneity in the WWW by using ontology mapping techniques that find the semantic correspondences between similar elements of two ontologies. • We are aiming to map ontology that are created from standard documents on Diabetes medical domain. • Our approach will enable better decision making support for queries on these documents Challenges in the existing systems • Identification of a safer drug regimen requires searching through a space of indicated regimens that outnumbers the pages Google searches 1000 to 1. • A single criterion is insufficient to guide the selection of a safer regimen. • Fragmented gathering and storage of clinical data • Lack of formal standardized knowledge representation of clinical data.

  4. Design Principles Open Close Principle Software entities like classes, modules and functions should be open for extension but closed for modifications. Dependency Inversion Principle a) High-level modules should not depend on low-level modules. Both should depend on abstractions. b) Abstractions should not depend on details. Details should depend on abstractions. Interface Segregation Principle Clients should not be forced to depend upon interfaces that they don't use.

  5. Design Principles contd… Single Responsibility Principle A class should have only one reason to change. Liskov's Substitution Principle Derived types must be completely substitutable for their base types.

  6. Technology Stack The architecture followed is a 2 tier architecture. Front-End : Java Back-end : Ontology(.owl files)

  7. Development Hardware Processor: Intel(R) Core™ 2 Duo CPU T6400 @ 2.00 GHZ Memory(RAM) : 4 GB System type: 32-bit Operating System Tools used Protégé - Ontology Creation (Stanford Open Source Tool) PDPTools – Neural networks Simulator ( Stanford Open Source Tool)

  8. Approach and Methodology • Software prototyping (Incremental prototyping) methodology is used for development. • The final product is built as separate prototypes. • At the end the separate prototypes are merged in an overall design • Steps are: a)Identification of basic requirements. b) Development of the initial prototype c) Review of prototype d)Revision and Enhancement of the Prototype

  9. Execution Framework • Eclipse IDE is used as the execution framework. • All the required plugins (jar files) from protégé/plugins/edu.stanford.smi.protegex.owl and OWL API ( open source API) are included in the build path of the Java project for accessing the ontology built using Protégé ( Stanford open source tool). • The IAC Neural networks is implemented using PDPTools suite of neural networks software ( Stanford tool for Parallel Distributed Processing) which runs in Matlab . All required inputs are taken from java environment by connectivity between Eclipse and Matlab

  10. Overall Architecture

  11. Modules covered 1) Creation of diabetes ontology from American Association of Clinical Endocrinologists (Benchmark document ) and from Wikipedia • Name Similarity Matrix calculated for all terms in both ontologies using the Levenshtein Distance formula ( Dynamic Programming Technique) • Profile Similarity Matrix calculated using term frequency – inverse document frequency (tf.idf statistical data mining algorithm ) . 4) Conversion of ontology terms to a vector space model and computation of Cosine Similarity matrix.

  12. Modules covered contd…. 5) Structural similarity matrix for calculation of structural similarity between ontologies using basic structural features such as depth from root, number of children , number of instances. • Similarity Aggregator for aggregating the name similarity , profile similarity and structural similarity 7) Harmony function estimation for filtering out the most useful similarities and eliminating the erroneous similarity. • IAC neural networks algorithm that satisfies a constraint satisfaction problem for improving the mapping between the two ontologies.

  13. Ontology Creation- Using Protégé

  14. Ontology 1

  15. Ontology 2

  16. Ontology Mapping

  17. Ontology Mapping Input: 2 homogeneous ontologies O1 and O2 expressed in formal ontology language (OWL/RDF) . Output: 4 Tuple: M(e1i , e2j , r, s) where ‘M’ is the mapping e1i is an element in O1 e2j is an element in O2 r mapping between e1i and e2j s confidence measure of mapping normalized from [0..1]

  18. IR Based Similarity Generator Input: Ontologies O1 ,O2 Output : 3 similarity matrices that contain similarity scores for each pair of elements in ontologies. Similarity Matrices : • Name Similarity • Profile Similarity • Structural Similarity

  19. Name Similarity This is calculated based on the edit distance between the name(id) of the elements NameSim(e1i, e2j) = 1- { EditDist(e1j , e2j) / Max(l(e1i) , l(e2j)) } where : EditDist - LevenShtein distance between elements. l(e1i) and l(e2j)- length of strings e1i and e2j.

  20. Sample Output for two Ontologies with 6 elements each

  21. Name similarity matrix of dimension 37*26

  22. Profile Similarity: The profile similarity is defined in 3 steps: • Profile Enrichment • Profile Propagation • Profile Mapping

  23. Profile Enrichment and Propagation • Profile of a class Class ID + Comments + Properties Profiles + Instances Profiles • Profile of a property Property ID + Property Domain + Property Range • Profile of an instance Instance ID+ Descriptive information

  24. Profile Mapping • Cosine similarity between the profiles of the 2 elements e1i and e2j is calculated in a vector space model . → → ProfileSim(e1i, e2j) = ( Vei1 Ve2j) / ( |Vei1||Ve2j| ) where: Ve1i and Ve2j are 2 vectors representing the profile of elements e1i and e2j respectively.

  25. Property domain range of Ontology1

  26. Property Domain Range of Ontology 2

  27. Cosine Similarity Matrix

  28. Structural similarity • This is applicable for classes alone as they have hierarchical information StructSim(e1i,e2j) = ∑ ( 1-diffk(e1i,e2j) / N where: e1i , e2j are 2 class elements in the ontology O1 and O2 respectively N – total number of structure features diffk(e1i , e2j) denotes the difference for feature k. diff(e1i,e2j) = (sf(e1i) - sf(e2j)) / max (sf(e1i) , sf(e2j)) where: sf(e1i), sf(e2j) denote the value of a structural feature of the element

  29. Identical Ontologies Similarity Calculation

  30. Structural Similarity Matrix

  31. Harmony • Harmony estimates the importance and reliability of different similarities. Harmony (h) = #s_max / min(#e1 ,#e2) where : #s_max - number of pairs of elements having the highest similarity in both the row and column in the similarity matrix. #ei - number of elements of ontology Oi

  32. Similarity matrices Harmony Estimation

  33. Adaptive Similarity Aggregator Input: Individual similarity matrices Output : Aggregated similarity matrix FinalSim(e1i,e2j) = ∑ hk * Simk( e1i,e2j) / n where: hk - kth similarity matrix harmony n- Total number of similarity matrices

  34. Final Aggregated Similarity Matrix

  35. IAC neural Network With Constraint Satisfaction

  36. H11 H12 H1n Architecture SYNAPSIS 1 H21 H22 H2n SYNAPSIS 2 H31 H32 H3n

  37. Neural Networks Constraint Satisfaction Sample Output

  38. Thank You

More Related