1 / 28

Cooperative Query Answering for Semistructured Data

Cooperative Query Answering for Semistructured Data. Speakers: Chuan Lin & Xi Zhang. By Michael Barg and Raymond K. Wong. Outline. Motivations Overview Basic Concepts Cooperative Query Processing Experiment. Motivations. XML data same semantic content very different structures.

kele
Download Presentation

Cooperative Query Answering for Semistructured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong

  2. Outline • Motivations • Overview • Basic Concepts • Cooperative Query Processing • Experiment

  3. Motivations • XML data • same semantic content • very different structures

  4. Court Transcript: plaintiff User Query: woman “insurance claims” related to “smoking” for “woman” insurance claim smoking Insurance Record: insurance claim insurer smoking woman Example: same semantics, diff structures

  5. Data: personnel User Query: “phone number” of “Bob” Who is the new “sales manager” sales manager salesman assistant sales manager salesman Joe Bob phone number phone number Motivations • No exact query result

  6. Overview • Goal: • Return approximate answers for XML queries • “approximate”: semantic + structural similar • Solution: • Return a set of results • ranked by an overall score • score: indicates how well the subgraph containing the result satisfies the query criteria.

  7. Basic Concepts: Query Tree Query: /restaurant[.//Soho]/phone_number Query Tree: Result Term restaurant t h h soho t phone_number r For each edge: “head”: the end which is closer to nearest result term “end”: the other end In case of tie, “head” is the end closer to root

  8. Basic Concepts: Converging Order • Order of edges considered in query processing • Converge on a result term

  9. shopping_ center restaurant soho soho restaurant restaurant soho soho restaurant eating_ places address restaurant soho (a) (b) (c) (d) (e) Basic Concepts: Similarity • Semantically similar topologies

  10. Basic Concepts: Similarity (cont.) • Deviation Proximity (DP) • Measure how far one structure deviates from a desired structure • Given: • ra: data node with value a • rb: data node with value b • Q(a,b): query tree edge • DP: the actual position of rb to the nearest position, r’b, which satisfies the topological relationship specified by Q(a,b) • Topological relationship: parent-child, ancestor-descendent

  11. restaurant soho soho eating_ places restaurant Deviation Proximity Q (restaurant, soho) requires parent-child relationship shopping_ center restaurant soho restaurant soho address restaurant (soho’) (soho’) soho (soho’) (soho’) (soho’) DP(restauarent, soho): 0 1 2 3 3

  12. restaurant soho soho eating_ places restaurant Deviation Proximity Q (restaurant, soho) requires anc-desc relationship shopping_ center restaurant soho restaurant soho address restaurant (soho’) soho (soho’) (soho’) (soho’) (soho’) DP(restauarent, soho): 0 0 2 3 3

  13. Cooperative Query Processing • Input: a Query Tree QT, an XML Document Tree DT • Output: ordered list of <rresult_term, score> • Cooperative Query Processing • Structural proximity calculation • Progressive Score

  14. Cooperative Query Processing (cont.) • Progressively matching edges in QT with DT • Consider edges in converging order • For each edge QT(a,b), where a is head and b is tail, get a list of <ra, score> • ra is a node in DT with value a • score is the progressive score of ra w.r.t the nearest rb • use graph encoding to calculate structural proximity of ra and rb

  15. Structural Proximity Calculation • Encodings and Compressed Arrays • Compact • Preserve relationship to a larger graph • Facilitate distance calculations • Proximity Searching

  16. Encodings and Compressed Arrays • Basic Concepts: • Common Node • Terminal Node • Annotated Node • Path representation • Representing Single Path • Representing Multiple Paths • Representing Multiple Elements • Compressed Arrays • Each encoding is a path/muti-path for a node/a set of nodes

  17. Encodings and Compressed Arrays

  18. Representing Single Path 1.1.1  y1 1.2.1.1.1.1  y2

  19. Representing Multiple Paths 1.3  B .B.2.1.1  C .3  C .C.2  y3

  20. Representing Multiple Elements .A.1.1y1 1 A .2.1.1.1.1 y2 .3  B.B.2.1.1  C.3  C.C.2  y3

  21. Compressed Arrays

  22. Drawback of Encoding • 1A.A.1B.B.1D.2E.?.2C.C.1F.2G

  23. Proximity Searching • Multi-Element Comparison • Input: • A compressed array, caN, containing the multi-element encoding of the Near Set. • A compressed array, caF, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF. • output: • dist, the shortest path from EF to the closest element in Near Set

  24. Proximity Searching MinDist=5 MinDist = 4 MinDist = 2

  25. Progressive Score • Accumulative Deviation Proximity (DP) • Calculated from structural proximity • Boolean operator at Query Tree branches a a b b c c prog(a) = prog(b)+prog(c) prog(a) = min (prog(b),prog(c))

  26. Experiment XML: Query: //restaurant/soho Query Result: <soho, 2> <soho, 3> <soho, 4>

  27. Thank you!

  28. Questions & Answers

More Related