1 / 27

Supporting Top-K Keyword Search in XML Databases

Supporting Top-K Keyword Search in XML Databases. ICDE 2010. Outline. Introduction Motivation Preliminaries Join-based Algorithm Join-based Top-k Algorithm Experiments Conclusions. Introduction. LCA:Lowest Common Ancestor. Introduction. LCA:Lowest Common Ancestor. Motivation.

Download Presentation

Supporting Top-K Keyword Search in XML Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting Top-K Keyword Search in XMLDatabases ICDE 2010

  2. Outline • Introduction • Motivation • Preliminaries • Join-based Algorithm • Join-based Top-k Algorithm • Experiments • Conclusions

  3. Introduction • LCA:Lowest Common Ancestor

  4. Introduction • LCA:Lowest Common Ancestor

  5. Motivation • The naive LCA-based semantics is straightforward, but leads to exponential computation and result size. • Two keywords:{XML} and {data} :lists of node XML. :lists of node data. the total number of the LCAs :m*n • Existing algorithms focusing on efficiency, cannot provide effective support for Top-k processing. • tg

  6. Preliminaries 1.Query Semantics • k-keyword query • :the list of nodes directly • :the LCA of nodes • ELCA semantics :the result as a set of nodes that contain at least one occurrence of all of the query keywords either in their labels or in the labels of their descendant nodes, after excluding the occurrences of the keywords in the subtrees that already contain at least one occurrence of all the query keywords

  7. Cont. • SLCA: a subset of such that no LCA in the subset is the ancestor of another LCA. • LCA:1.1, 1.1.2, 1, 1.3.4, 1.3 • SLCA:1.1.2, 1.3.4 • ELCA:1.1.2, 1.3.4, 1

  8. Cont. 2.Ranking Function

  9. Cont. • : a decreasing function

  10. Join-based Algorithm 1.Node encoding

  11. Join-based Algorithm 2.Algorithm .Two lists of nodes: . . .

  12. Cont. (2,3) join (1),no matched

  13. Cont. (3,5,6) join (1,2,4) no matched

  14. Cont. (2,3,4,5) join (1,2,4)=>(2,4) matchedthe nodes numbered 2 and 4 at level 3 are the lowest ELCAs=>erased

  15. Cont. (2,3) join (1) ,no matched

  16. Cont. (1,1) join (1) matched=>root is ELCA 1 correspond two node (1.2.3 and 1.3.5.6),output one of them

  17. Cont.

  18. Cont. Score:(1.3.4.5.3.1.1) is greater than Score(1.3.5.6) But in 4th column, 0.5*d(3) may greater than or equal 0.44

  19. Cont.

  20. Cont. Assume d( ): Join column 5 and 4: no result

  21. Cont. Column 3: Number 2 is matched It’s score is 0.73+0,41=1.14 Threshold of the unseen results in column 3 is =max{0.7+0.3,0.5+0.4}=1

  22. Cont. Consider the unseen results in other column: column 1 and 2 do not contain sequence s. ignore. Consider column 2:the maximum scores 0.7*0.9 and 0.5*0.9, threshold is 0.63+0.45=1.08<1.14 Therefore , node 2 at level 3 can output.

  23. Experiments

  24. Cont.

  25. Cont.

  26. Cont.

  27. Conclusions • 1. Join-based Algorithm has good performance in high frequency • 2. Join-based Top-k Algorithm has good performance in high correlation.

More Related