1 / 28

 Speeding Up Batch Alignment of Large Ontologies Using MapReduce

 Speeding Up Batch Alignment of Large Ontologies Using MapReduce. Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia. Introduction. Ontology : formalize the knowledge of a domain by means of defining concepts and properties that relate them.

Download Presentation

 Speeding Up Batch Alignment of Large Ontologies Using MapReduce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1.  Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia

  2. Introduction Ontology: formalize the knowledge of a domain by means of defining concepts and properties that relate them

  3. Introduction: Ontology Alignment

  4. Introduction: Ontology Alignment

  5. Introduction: Ontology Alignment

  6. Problem Definition: Ontology Alignment The ontology alignment problem: find a set of correspondences between two ontologiesO1 = < V1, E1, L1 > and O2 = < V2, E2, L2 >. • Ontology • V: Set of Labeled Vertices • E: Set of Edges • Set of ordered 2-subset of V • L: Mapping from each edge to its label • A correspondence maα between xaϵO1 and yαϵO2 • Relation • Confidence

  7. Ontology Alignment Challenges Efficiency / Quality Efficiency / Quality • Improving the Alignment Quality • Structural & lexical disparity • Improving the Alignment Efficiency • Quickly producing quality alignment • Improving the Scalability Resources Ontology Sizes

  8. Space of Alignments & Alignment between many-to-many one-to-many one-to-one Alignment Space Size: Evaluating An Alignment: Cartesian Product of entities

  9. Space of Alignments & Alignment between Bipartite graph many-to-many one-to-many one-to-one Alignment Space Size: Evaluating An Alignment: Cartesian Product of entities

  10. Large Ontology Matching O1 O2 P21 P22 P23 P11 P12 P13 4 blocks • Reduction of alignment space • Early pruning of dissimilar element pairs • aflood(Hanif and Masaki ‘09) • Partition based matching • Falcon-AO (Jian et. al. ‘05) • Parallel matching • MapPSO (Bock and Hettenhausen ‘10) • VDoc+ (Zhang ‘12)

  11. Batch Alignment of Large Ontologies Approach allows any alignment algorithm to be utilized on a MapReduce architecture • Scalability is challenging • OAEI 2012 - Very Large Biomedical Ontology Track • 8 out of 21 tools completed • Ontology repositories (e.g., NCBO at Stanford) • Batch alignment of ontologies • New ontologies posted • Ontologies get updated

  12. Contributions: Batch Alignment of Large Ontologies General & Novel ApproachTo speed up batch alignment of large ontologies using MapReduce • No impact to alignment quality for some algorithms • Benefits ontology repositories

  13. MapReduce Framework

  14. MapReduce Framework output Key-> Output Value Key-> <Value1, Value2> Key-> Value Key identifies a subproblem

  15. MapReduce Framework O1 O11 O21 O31 • O2 • O12 • O22

  16. MapReduce Framework O1 O11 O21 O31 • O2 • O12 • O22

  17. MapReduce Framework O1 O11 O21 O31 • O2 • O12 • O22

  18. MapReduce Framework O1 O11 O21 O31 • O2 • O12 • O22

  19. Mapper & Reducer Algorithms MAP • ← parse the Value in the record • emit() • emit(,) REDUCE • ← align using an alignment algorithm • emit(,)

  20. Identifying Alignment Subproblems Entities from one cluster are predominantly in correspondence with entities in one other cluster • Approach: Hamdi et al. 2010 • Identify anchors: entity pairs with identical names or labels • Cluster concepts around the anchors • Using structural neighborhood

  21. Merging Subproblem Alignments Crisscross mappings • Correspondence1: • Correspondence2: • & • is a subclass of and is a subclass of  inconsistent • We remove the one with the lower confidence score while merging. Redundant mappings • Correspondence1: • Correspondence2: • & • is a subclass of  inconsistent • We remove

  22. Performance Evaluation Falcon-AO Optima+ LogMap YAM++ • Datasets • Conference track from OAEI (120 pairs) • Large ontologies from OAEI (SNOMED, NCI, ... 5 pairs) • New biomedical ontology testbed (50 pairs from NCBO) • Algorithms • Compare F-measure & runtime • Default setup on a single node • MapReduce setup using Hadoop (12 nodes each with 24 2GB & 2GHz Intel Xeon processors)

  23. Results – 3 Datasets Conference Large OAEI Biomedical

  24. Results – Large OAEI ontologies • Other Datasets • LogMap & Yam++ : • Tradeoff is in the alignment quality • Falcon-AO & Optima+: • No change in output • Conference Track • No partitioning • No change in output

  25. Speedup with # of nodes in the Hadoop cluster

  26. Discussion • First inter-matcher parallelization approach • Especially using MapReduce • Exhibits significant speedup for batch alignment • Some algorithms may find small reduction in alignment quality due to the partitioning • Significant speedup for single ontology pair • Falcon-AO, Optima+ & YAM++ • Any alignment algorithm can fit in our framework

  27. Thank you Questions ?

  28. Parallel Alignment of Large Ontologieson A Computing Cluster • Current Divide and Conquer Approaches • Heavily rely on structure • Size based partitioning techniques are not effective • Current Parallel Matching algorithms • Parallelize the process within the algorithms • Do not support multi node – cluster architecture

More Related