280 likes | 376 Views
This paper discusses cooperative query answering for semistructured data, focusing on returning approximate answers for XML queries by assessing semantic and structural similarities. It covers basic concepts like query trees, converging order, similarity, and deviation proximity. The approach involves ordered lists of results ranked by overall score and uses graph encodings for proximity calculations.
E N D
Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong
Outline • Motivations • Overview • Basic Concepts • Cooperative Query Processing • Experiment
Motivations • XML data • same semantic content • very different structures
Court Transcript: plaintiff User Query: woman “insurance claims” related to “smoking” for “woman” insurance claim smoking Insurance Record: insurance claim insurer smoking woman Example: same semantics, diff structures
Data: personnel User Query: “phone number” of “Bob” Who is the new “sales manager” sales manager salesman assistant sales manager salesman Joe Bob phone number phone number Motivations • No exact query result
Overview • Goal: • Return approximate answers for XML queries • “approximate”: semantic + structural similar • Solution: • Return a set of results • ranked by an overall score • score: indicates how well the subgraph containing the result satisfies the query criteria.
Basic Concepts: Query Tree Query: /restaurant[.//Soho]/phone_number Query Tree: Result Term restaurant t h h soho t phone_number r For each edge: “head”: the end which is closer to nearest result term “end”: the other end In case of tie, “head” is the end closer to root
Basic Concepts: Converging Order • Order of edges considered in query processing • Converge on a result term
shopping_ center restaurant soho soho restaurant restaurant soho soho restaurant eating_ places address restaurant soho (a) (b) (c) (d) (e) Basic Concepts: Similarity • Semantically similar topologies
Basic Concepts: Similarity (cont.) • Deviation Proximity (DP) • Measure how far one structure deviates from a desired structure • Given: • ra: data node with value a • rb: data node with value b • Q(a,b): query tree edge • DP: the actual position of rb to the nearest position, r’b, which satisfies the topological relationship specified by Q(a,b) • Topological relationship: parent-child, ancestor-descendent
restaurant soho soho eating_ places restaurant Deviation Proximity Q (restaurant, soho) requires parent-child relationship shopping_ center restaurant soho restaurant soho address restaurant (soho’) (soho’) soho (soho’) (soho’) (soho’) DP(restauarent, soho): 0 1 2 3 3
restaurant soho soho eating_ places restaurant Deviation Proximity Q (restaurant, soho) requires anc-desc relationship shopping_ center restaurant soho restaurant soho address restaurant (soho’) soho (soho’) (soho’) (soho’) (soho’) DP(restauarent, soho): 0 0 2 3 3
Cooperative Query Processing • Input: a Query Tree QT, an XML Document Tree DT • Output: ordered list of <rresult_term, score> • Cooperative Query Processing • Structural proximity calculation • Progressive Score
Cooperative Query Processing (cont.) • Progressively matching edges in QT with DT • Consider edges in converging order • For each edge QT(a,b), where a is head and b is tail, get a list of <ra, score> • ra is a node in DT with value a • score is the progressive score of ra w.r.t the nearest rb • use graph encoding to calculate structural proximity of ra and rb
Structural Proximity Calculation • Encodings and Compressed Arrays • Compact • Preserve relationship to a larger graph • Facilitate distance calculations • Proximity Searching
Encodings and Compressed Arrays • Basic Concepts: • Common Node • Terminal Node • Annotated Node • Path representation • Representing Single Path • Representing Multiple Paths • Representing Multiple Elements • Compressed Arrays • Each encoding is a path/muti-path for a node/a set of nodes
Representing Single Path 1.1.1 y1 1.2.1.1.1.1 y2
Representing Multiple Paths 1.3 B .B.2.1.1 C .3 C .C.2 y3
Representing Multiple Elements .A.1.1y1 1 A .2.1.1.1.1 y2 .3 B.B.2.1.1 C.3 C.C.2 y3
Drawback of Encoding • 1A.A.1B.B.1D.2E.?.2C.C.1F.2G
Proximity Searching • Multi-Element Comparison • Input: • A compressed array, caN, containing the multi-element encoding of the Near Set. • A compressed array, caF, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF. • output: • dist, the shortest path from EF to the closest element in Near Set
Proximity Searching MinDist=5 MinDist = 4 MinDist = 2
Progressive Score • Accumulative Deviation Proximity (DP) • Calculated from structural proximity • Boolean operator at Query Tree branches a a b b c c prog(a) = prog(b)+prog(c) prog(a) = min (prog(b),prog(c))
Experiment XML: Query: //restaurant/soho Query Result: <soho, 2> <soho, 3> <soho, 4>