speeding up batch alignment of large ontologies using mapreduce n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
 Speeding Up Batch Alignment of Large Ontologies Using MapReduce PowerPoint Presentation
Download Presentation
 Speeding Up Batch Alignment of Large Ontologies Using MapReduce

Loading in 2 Seconds...

play fullscreen
1 / 28

 Speeding Up Batch Alignment of Large Ontologies Using MapReduce - PowerPoint PPT Presentation


  • 140 Views
  • Uploaded on

 Speeding Up Batch Alignment of Large Ontologies Using MapReduce. Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia. Introduction. Ontology : formalize the knowledge of a domain by means of defining concepts and properties that relate them.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Speeding Up Batch Alignment of Large Ontologies Using MapReduce' - ventana-hidalgo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speeding up batch alignment of large ontologies using mapreduce

 Speeding Up Batch Alignment of Large Ontologies Using MapReduce

Uthayasanker Thayasivam and Prashant Doshi

Dept. of Computer Science

University of Georgia

introduction
Introduction

Ontology: formalize the knowledge of a domain by means of defining concepts and properties that relate them

problem definition ontology alignment
Problem Definition: Ontology Alignment

The ontology alignment problem:

find a set of correspondences between two ontologiesO1 = < V1, E1, L1 > and O2 = < V2, E2, L2 >.

  • Ontology
    • V: Set of Labeled Vertices
    • E: Set of Edges
      • Set of ordered 2-subset of V
    • L: Mapping from each edge to its label
  • A correspondence maα between xaϵO1 and yαϵO2
    • Relation
    • Confidence
ontology alignment challenges
Ontology Alignment Challenges

Efficiency / Quality

Efficiency / Quality

  • Improving the Alignment Quality
    • Structural & lexical disparity
  • Improving the Alignment Efficiency
    • Quickly producing quality alignment
  • Improving the Scalability

Resources

Ontology Sizes

space of alignments
Space of Alignments

&

Alignment between

many-to-many

one-to-many

one-to-one

Alignment Space Size:

Evaluating An Alignment: Cartesian Product of entities

space of alignments1
Space of Alignments

&

Alignment between

Bipartite graph

many-to-many

one-to-many

one-to-one

Alignment Space Size:

Evaluating An Alignment: Cartesian Product of entities

large ontology matching
Large Ontology Matching

O1

O2

P21

P22

P23

P11

P12

P13

4 blocks

  • Reduction of alignment space
    • Early pruning of dissimilar element pairs
      • aflood(Hanif and Masaki ‘09)
    • Partition based matching
      • Falcon-AO (Jian et. al. ‘05)
  • Parallel matching
      • MapPSO (Bock and Hettenhausen ‘10)
      • VDoc+ (Zhang ‘12)
batch alignment of large ontologies
Batch Alignment of Large Ontologies

Approach allows any alignment algorithm to be utilized on a MapReduce architecture

  • Scalability is challenging
    • OAEI 2012 - Very Large Biomedical Ontology Track
      • 8 out of 21 tools completed
  • Ontology repositories (e.g., NCBO at Stanford)
    • Batch alignment of ontologies
      • New ontologies posted
      • Ontologies get updated
contributions batch alignment of large ontologies
Contributions: Batch Alignment of Large Ontologies

General & Novel ApproachTo speed up batch alignment of large ontologies using MapReduce

  • No impact to alignment quality for some algorithms
  • Benefits ontology repositories
mapreduce framework1
MapReduce Framework

output

Key-> Output Value

Key-> <Value1, Value2>

Key-> Value

Key identifies a subproblem

mapreduce framework2
MapReduce Framework

O1

O11

O21

O31

  • O2
  • O12
  • O22
mapreduce framework3
MapReduce Framework

O1

O11

O21

O31

  • O2
  • O12
  • O22
mapreduce framework4
MapReduce Framework

O1

O11

O21

O31

  • O2
  • O12
  • O22
mapreduce framework5
MapReduce Framework

O1

O11

O21

O31

  • O2
  • O12
  • O22
mapper reducer algorithms
Mapper & Reducer Algorithms

MAP

  • ← parse the Value in the record
  • emit()
  • emit(,)

REDUCE

  • ← align using an alignment algorithm
  • emit(,)
identifying alignment subproblems
Identifying Alignment Subproblems

Entities from one cluster are predominantly in correspondence with entities in one other cluster

  • Approach: Hamdi et al. 2010
  • Identify anchors: entity pairs with identical names or labels
  • Cluster concepts around the anchors
    • Using structural neighborhood
merging subproblem alignments
Merging Subproblem Alignments

Crisscross mappings

  • Correspondence1:
  • Correspondence2:
  • &
  • is a subclass of and is a subclass of

 inconsistent

  • We remove the one with the lower confidence score while merging.

Redundant mappings

  • Correspondence1:
  • Correspondence2:
  • &
  • is a subclass of

 inconsistent

  • We remove
performance evaluation
Performance Evaluation

Falcon-AO Optima+ LogMap YAM++

  • Datasets
    • Conference track from OAEI (120 pairs)
    • Large ontologies from OAEI (SNOMED, NCI, ... 5 pairs)
    • New biomedical ontology testbed (50 pairs from NCBO)
  • Algorithms
  • Compare F-measure & runtime
    • Default setup on a single node
    • MapReduce setup using Hadoop (12 nodes each with 24 2GB & 2GHz Intel Xeon processors)
results 3 datasets
Results – 3 Datasets

Conference

Large OAEI

Biomedical

results large oaei ontologies
Results – Large OAEI ontologies
  • Other Datasets
    • LogMap & Yam++ :
      • Tradeoff is in the alignment quality
    • Falcon-AO & Optima+:
      • No change in output
  • Conference Track
    • No partitioning
      • No change in output
discussion
Discussion
  • First inter-matcher parallelization approach
    • Especially using MapReduce
  • Exhibits significant speedup for batch alignment
    • Some algorithms may find small reduction in alignment quality due to the partitioning
  • Significant speedup for single ontology pair
    • Falcon-AO, Optima+ & YAM++
  • Any alignment algorithm can fit in our framework
thank you
Thank you

Questions ?

parallel alignment of large ontologies on a computing cluster
Parallel Alignment of Large Ontologieson A Computing Cluster
  • Current Divide and Conquer Approaches
    • Heavily rely on structure
    • Size based partitioning techniques are not effective
  • Current Parallel Matching algorithms
    • Parallelize the process within the algorithms
    • Do not support multi node – cluster architecture