1 / 38

TOSS: An Extension of TAX with Ontologies and Similarity Queries

TOSS: An Extension of TAX with Ontologies and Similarity Queries. Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland, College Park SIGMOD, Paris, France, June, 2004. Outline. Introduction Ontologies and Integration

holli
Download Presentation

TOSS: An Extension of TAX with Ontologies and Similarity Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland, College Park SIGMOD, Paris, France, June, 2004

  2. Outline • Introduction • Ontologies and Integration • Similarity Enhanced Ontology (SEO) • TOSS Algebra • Implementation and Experiments • Related Work

  3. Introduction • [Jagadish et al., TAX: A Tree Algebra in XML, in DBPL, 2001] • one of the best algebra developed for XML DB

  4. SIGMOD Problems! DBLP

  5. Problems • Lack of lexical semantics in answering queries • Find papers written by “J. Ullman”: • J.D. Ullman? Jeffrey Ullman? • Find papers whose at least one author is from “U.S. government”: • U.S. Census Bureau? U.S. Army? • High precision, poor recall • Quality = (recall  precision)1/2

  6. Our approach • Goal: extend and enhance the semantics of TAX to return high quality answers using ontology and similarity measures • capture inter-term lexical relationships by ontology and integrate ontologies of different DBs • use existing similarity measures to enhance the integrated ontology • TOSS: extend TAX algebra to query with ontology and similarity

  7. Motivating Examples and TAX • DBLP and SIGMOD bibliographies in XML • TAX • selection • projection • product

  8. DBLP

  9. Pattern tree • Selection

  10. Pattern tree • Selection

  11. Pattern tree • Selection

  12. Architecture

  13. Architecture

  14. Ontology • Suppose Σ is some finite set of strings and S is some set. An ontology w.r.t. Σ is a partial mapping Θ from Σ to hierarchies for S • S = {article, author, title} • Σ = {part_of} • ≤H = {(author, article), (title, article)} • Θ(part_of) = (H, ≤H)

  15. Ontology Integration SIGMOD DBLP

  16. Ontology Integration SIGMOD DBLP IC (interoperation constraints)

  17. Ontology Integration Hierarchy graph associated with SIGMOD and DBLP

  18. Ontology Integration Fusion of ontologies of SIGMOD and DBLP

  19. Architecture

  20. Similarity Enhanced Ontology • A string similarity measure dS is any function which takes two strings X,Y and returns a non-negative real number such that • X, dS(X,X) = 0 • X,Y, dS(X,Y) = dS(Y,X) • Any string similarity measure can be used. For example: Levenstein distance which assigns a unit cost to every edit operation. • dS(“relation”, “relational”)=2

  21. Similarity Enhanced Ontology • A similarity measure is any function which takes nodes A, B as input and returns a non-negative real numbers such that • d(A,B) = minXS,YT dS(X,Y), where dS is a string similarity measure, S,T are sets of strings contained in nodes A,B.

  22. Similarity Enhanced Ontology • Suppose H is an integrated hierarchy, d is a similarity measure and   0. (H’,) is a similarity enhancement of H w.r.t. d, iff H’ is a hierarchy and  is a function from H to 2H’ such that: • the original partial orderings in H are preserved, and no unwarranted orderings are included • all nodesmapped into the same node are similar to each other (by the threshold ) • two strings are similar iff they are jointly present in some node in (H’,) • no redundantnode whose string set is a subset of some other node

  23. Similarity Enhanced Ontology An example ontology Its similarity enhancement

  24. Similarity Enhanced Ontology • (H, d, ) is similarity consistent iff there exists a similarity enhancement of H w.r.t. d, . • Theorem • If (H, d, ) is similarity consistent, then all similarity enhancements of H are equivalent.

  25. Architecture

  26. TOSS Algebra • A simple selection condition has the form X op Y • op  { =, , <, , >, , ~, instance_of, isa, part_of, subtype_of, above, below}, and X, Y are terms, i.e.,attributes (tag, content), types, or typed values v: with v  dom(). • A selection condition is a simple selection condition OR a conjunction/disjunction of two selection conditions

  27. TOSS Algebra • The pattern tree to find the titles of all papers in DBLP related to Microsoft (independently of the field in which Microsoft appears): #1.tag = inproceedings & #2.tag = title & #3.tag part_of inproceedings & #3.content ~ “Microsoft”

  28. TOSS Algebra • In order to ensure an embedding to be correct w.r.t. a semistructured DB with an associated similarity enhanced ontology, • we define a selection condition to be well-typed if X and Y have a least common supertype  and there exists a function to convert their types to . • we define (1) the type and value of a term w.r.t. a mapping h, and (2) the satisfaction of a selection condition • We extend the following algebraic operations: selection, projection, product, union, intersection, difference.

  29. Implementation and Experiments • TOSS system implemented in Java • built on top of Xindice DBMS • Experiments: • Recall and precision • Scalability • selection • join

  30. Recall and Precision • =TAX • X = TOSS (=2) • + = TOSS (=3)

  31. Quality of Answers

  32. Quality of Answers • =TAX • X = TOSS (=2) • + = TOSS (=3) • Quality =

  33. Related Work • Wiederhold et al. [ICOT’ 94, EDBT’00,…] • ontology algebra (LISP-style logical statements) • IC (interoperation constraints) are not considered • A similar concept as IC is considered in EDBT’00, but their integrated ontologies were not concise. • Besides, we deal with XML documents.

  34. Related Work • [Jagadish et al., TAX: A Tree Algebra in XML, in DBPL, 2001] • algebra to query XML documents • ontology is not used • [Al-Khalifa et al., Querying structured text in an XML database, in SIGMOD 2003] • IR-style query to find relevant results with weighting and ranking support in run-time • We use ontologies and similarity measures; we consider integration of ontologies and precompute SEO.

  35. Questions and Answers Thank you very much!

More Related