toss an extension of tax with ontologies and similarity queries l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
TOSS: An Extension of TAX with Ontologies and Similarity Queries PowerPoint Presentation
Download Presentation
TOSS: An Extension of TAX with Ontologies and Similarity Queries

Loading in 2 Seconds...

play fullscreen
1 / 38

TOSS: An Extension of TAX with Ontologies and Similarity Queries - PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on

TOSS: An Extension of TAX with Ontologies and Similarity Queries. Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland, College Park SIGMOD, Paris, France, June, 2004. Outline. Introduction Ontologies and Integration

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'TOSS: An Extension of TAX with Ontologies and Similarity Queries' - holli


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
toss an extension of tax with ontologies and similarity queries

TOSS: An Extension of TAX with Ontologies and Similarity Queries

Edward Hung, Yu Deng, V.S. Subrahmanian

Department of Computer Science

University of Maryland, College Park

SIGMOD, Paris, France, June, 2004

outline
Outline
  • Introduction
  • Ontologies and Integration
  • Similarity Enhanced Ontology (SEO)
  • TOSS Algebra
  • Implementation and Experiments
  • Related Work
introduction
Introduction
  • [Jagadish et al., TAX: A Tree Algebra in XML, in DBPL, 2001]
    • one of the best algebra developed for XML DB
slide4

SIGMOD

Problems!

DBLP

problems
Problems
  • Lack of lexical semantics in answering queries
    • Find papers written by “J. Ullman”:
      • J.D. Ullman? Jeffrey Ullman?
    • Find papers whose at least one author is from “U.S. government”:
      • U.S. Census Bureau? U.S. Army?
  • High precision, poor recall
  • Quality = (recall  precision)1/2
our approach
Our approach
  • Goal: extend and enhance the semantics of TAX to return high quality answers using ontology and similarity measures
  • capture inter-term lexical relationships by ontology and integrate ontologies of different DBs
  • use existing similarity measures to enhance the integrated ontology
  • TOSS: extend TAX algebra to query with ontology and similarity
motivating examples and tax
Motivating Examples and TAX
  • DBLP and SIGMOD bibliographies in XML
  • TAX
    • selection
    • projection
    • product
slide9
Pattern tree
  • Selection
slide10
Pattern tree
  • Selection
slide11
Pattern tree
  • Selection
ontology
Ontology
  • Suppose Σ is some finite set of strings and S is some set. An ontology w.r.t. Σ is a partial mapping Θ from Σ to hierarchies for S
    • S = {article, author, title}
    • Σ = {part_of}
    • ≤H = {(author, article), (title, article)}
    • Θ(part_of) = (H, ≤H)
ontology integration16
Ontology Integration

SIGMOD

DBLP

IC (interoperation constraints)

ontology integration17
Ontology Integration

Hierarchy graph associated with SIGMOD and DBLP

ontology integration18
Ontology Integration

Fusion of ontologies of SIGMOD and DBLP

similarity enhanced ontology
Similarity Enhanced Ontology
  • A string similarity measure dS is any function which takes two strings X,Y and returns a non-negative real number such that
    • X, dS(X,X) = 0
    • X,Y, dS(X,Y) = dS(Y,X)
  • Any string similarity measure can be used. For example: Levenstein distance which assigns a unit cost to every edit operation.
    • dS(“relation”, “relational”)=2
similarity enhanced ontology21
Similarity Enhanced Ontology
  • A similarity measure is any function which takes nodes A, B as input and returns a non-negative real numbers such that
    • d(A,B) = minXS,YT dS(X,Y), where dS is a string similarity measure, S,T are sets of strings contained in nodes A,B.
similarity enhanced ontology22
Similarity Enhanced Ontology
  • Suppose H is an integrated hierarchy, d is a similarity measure and   0. (H’,) is a similarity enhancement of H w.r.t. d, iff H’ is a hierarchy and  is a function from H to 2H’ such that:
    • the original partial orderings in H are preserved, and no unwarranted orderings are included
    • all nodesmapped into the same node are similar to each other (by the threshold )
    • two strings are similar iff they are jointly present in some node in (H’,)
    • no redundantnode whose string set is a subset of some other node
similarity enhanced ontology23
Similarity Enhanced Ontology

An example ontology

Its similarity enhancement

similarity enhanced ontology24
Similarity Enhanced Ontology
  • (H, d, ) is similarity consistent iff there exists a similarity enhancement of H w.r.t. d, .
  • Theorem
    • If (H, d, ) is similarity consistent, then all similarity enhancements of H are equivalent.
toss algebra
TOSS Algebra
  • A simple selection condition has the form X op Y
    • op  { =, , <, , >, , ~, instance_of, isa, part_of, subtype_of, above, below}, and X, Y are terms, i.e.,attributes (tag, content), types, or typed values v: with v  dom().
  • A selection condition is a simple selection condition OR a conjunction/disjunction of two selection conditions
toss algebra27
TOSS Algebra
  • The pattern tree to find the titles of all papers in DBLP related to Microsoft (independently of the field in which Microsoft appears):

#1.tag = inproceedings &

#2.tag = title &

#3.tag part_of inproceedings &

#3.content ~ “Microsoft”

toss algebra28
TOSS Algebra
  • In order to ensure an embedding to be correct w.r.t. a semistructured DB with an associated similarity enhanced ontology,
    • we define a selection condition to be well-typed if X and Y have a least common supertype  and there exists a function to convert their types to .
    • we define (1) the type and value of a term w.r.t. a mapping h, and (2) the satisfaction of a selection condition
  • We extend the following algebraic operations: selection, projection, product, union, intersection, difference.
implementation and experiments
Implementation and Experiments
  • TOSS system implemented in Java
  • built on top of Xindice DBMS
  • Experiments:
    • Recall and precision
    • Scalability
      • selection
      • join
recall and precision
Recall and Precision
  • =TAX
  • X = TOSS (=2)
  • + = TOSS (=3)
quality of answers32
Quality of Answers
  • =TAX
  • X = TOSS (=2)
  • + = TOSS (=3)
  • Quality =
related work
Related Work
  • Wiederhold et al. [ICOT’ 94, EDBT’00,…]
    • ontology algebra (LISP-style logical statements)
      • IC (interoperation constraints) are not considered
    • A similar concept as IC is considered in EDBT’00, but their integrated ontologies were not concise.
    • Besides, we deal with XML documents.
related work37
Related Work
  • [Jagadish et al., TAX: A Tree Algebra in XML, in DBPL, 2001]
    • algebra to query XML documents
    • ontology is not used
  • [Al-Khalifa et al., Querying structured text in an XML database, in SIGMOD 2003]
    • IR-style query to find relevant results with weighting and ranking support in run-time
    • We use ontologies and similarity measures; we consider integration of ontologies and precompute SEO.
questions and answers

Questions and Answers

Thank you very much!