1 / 32

Query Ranking in Probabilistic XML Data

Query Ranking in Probabilistic XML Data. Lijun Chang, Jeffrey Xu Yu, Lu Qin Published In EDBT 2009. Presented by Yongxin Tong. Query Ranking in Probabilistic XML Data. Motivations. Related Work. Problem Definition. Main Solution. Experimental Results. Outline. Related Work.

adah
Download Presentation

Query Ranking in Probabilistic XML Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Ranking in Probabilistic XML Data Lijun Chang, Jeffrey Xu Yu, Lu Qin Published In EDBT 2009 Presented by Yongxin Tong Query Ranking in Probabilistic XML Data

  2. Motivations Related Work Problem Definition Main Solution Experimental Results Outline Related Work Problem Definition Motivations Experimental Results 3 2 1 5 Main Solution 4 Query Ranking in Probabilistic XML Data 2

  3. Motivations Related Work Problem Definition Main Solution Experimental Results Outline Related Work Problem Definition Motivations Experimental Results 3 2 1 5 Main Solution 4 Query Ranking in Probabilistic XML Data 3

  4. Motivations Related Work Problem Definition Main Solution Experimental Results Motivations • Twig query is a very fundamental topic in traditional XML data management. Query Ranking in Probabilistic XML Data 4

  5. Motivations Related Work Problem Definition Main Solution Experimental Results Motivations Uncertainty Data Cleaning Information Extraction Sensor Nets Moving Objects …… Ranking over Uncertain Data Probabilistic XML (PXML) Database Query Ranking in Probabilistic XML Data 5

  6. Motivations Related Work Problem Definition Main Solution Experimental Results Motivations Probabilistic XML (PXML) Database Ranking over Uncertain Data Twig Query in Certain XML Data Query Ranking in Probabilistic XML Query Ranking in Probabilistic XML Data 6

  7. Motivations Related Work Problem Definition Main Solution Experimental Results Outline Related Work Problem Definition Motivations Experimental Results 3 2 1 5 Main Solution 4 Query Ranking in Probabilistic XML Data 7

  8. Motivations Related Work Problem Definition Main Solution Experimental Results Related Work (1) • Twig Query in Certain XML Data • First work: TwigStack (Brunoet al., SIGMOD 2002) • Other representative works: Twig2Stack (Chen et al., VLDB 2006), TwigList (Qin et al., DASFAA 2007) • Probabilistic XML Data: • Probabilistic XML Data Model: ProTDB (Nierman, VLDB2002), PXML (Hung et al., ICDE2003), • Comparison of Probabilistic XML Data Model: PrXML (Kimelfeld et al., SIGMOD2008/2009, VLDB Journal 2009) • Maching Twigs in Probabilistic XML: MatchingTwigs (Kimelfeld et al., VLDB2007), Extending Holistic Join Algorithm (Li et al, ICDE 2009) • Ranking Query over Uncertain Data (Under X-Relation Model) Query Ranking in Probabilistic XML Data 8

  9. Motivations Related Work Problem Definition Main Solution Experimental Results Related Work (2) • Ranking tuple from certain data to uncertain data Query Ranking in Probabilistic XML Data 9

  10. Motivations Related Work Problem Definition Main Solution Experimental Results Related Work (3) • Possible Worlds under X-Relation Model • Please note that t2 and t4 are mutual exclusive Query Ranking in Probabilistic XML Data 10

  11. Motivations Related Work Problem Definition Main Solution Experimental Results Related Work (4) • Global Top-k Semantics (Zhang et al, DBRank 2008) • Given k = 2 • Result = {t3 (0.8), t2 (0.5)} Query Ranking in Probabilistic XML Data 11

  12. Motivations Related Work Problem Definition Main Solution Experimental Results Related Work (4) • PT-k Semantics (Hua et al, SIGMOD 2008) • Given k = 2 & Threshold = 0.5 • Result = {t3 (0.8), t2 (0.5)} Query Ranking in Probabilistic XML Data 12

  13. Motivations Related Work Problem Definition Main Solution Experimental Results Related Work (5) • U-Topk Semantics (Soliman et al, ICDE2007) • Given k = 2 • Result = [{t2, t3} (0.3) or {t3, t4} (0.3)] Query Ranking in Probabilistic XML Data 13

  14. Motivations Related Work Problem Definition Main Solution Experimental Results Related Work (4) • U-kRanks Semantics (Soliman et al, ICDE2007) • Given k = 2 • Result = {t1 (0.4), t3 (0.5)} Query Ranking in Probabilistic XML Data 14

  15. Motivations Related Work Problem Definition Main Solution Experimental Results Outline Related Work Problem Definition Motivations Experimental Results 3 2 1 5 Main Solution 4 Query Ranking in Probabilistic XML Data 15

  16. Motivations Related Work Problem Definition Main Solution Experimental Results Problem Definition (1): PrXML{ind, mux} Model Query Ranking in Probabilistic XML Data 16

  17. Motivations Related Work Problem Definition Main Solution Experimental Results Problem Definition (2): Possible World Semantics (a) PXML tree (b) A Possible World Probability of the possible world: Pr(PW) = 0.3×0.8×0.5×0.3 = 0.036 Query Ranking in Probabilistic XML Data 17

  18. Motivations Related Work Problem Definition Main Solution Experimental Results Problem Definition (3): Top-k probability of a Twig Query • Type of Twig Query • Node Query: //A • Path Query: //A//B • Tree Query: //A[.//C]//B • Top-k probability of a twig query:Given a PXML tree, TP, and a twig query, Q, we can get a result set, M= {φ1, φ2, …, φN} if we ignore the existence of distribution nodes in TP. Given a score function, we can sort for M. And there are the set of all possible world against TP, { }. Thus, the top-k probability for each twig query result, φi, is defined as follows. Query Ranking in Probabilistic XML Data 18

  19. Motivations Related Work Problem Definition Main Solution Experimental Results Problem Definition (4): Top-k PXML Ranking • Top-k Probabilistic XML Ranking (PXML-Rank): LetTPbe a PXML tree with possible worlds pwd(TP). A PXML-Rank query, (Q, k), is specified by a twig query, Q, and a positive number, k, against TP . It ranks the top-k probabilities for the answers, φi, that satisfy the twig query Q. Query Ranking in Probabilistic XML Data 19

  20. Motivations Related Work Problem Definition Main Solution Experimental Results Problem Definition (4): Top-k PXML Ranking • Global Top-k Semantics (Zhang et al, DBRank 2008) • Given k = 2 • Result = {t3 (0.8), t2 (0.5)} Query Ranking in Probabilistic XML Data 20

  21. Motivations Related Work Problem Definition Main Solution Experimental Results Problem Definition (4): Top-k PXML Ranking • Top-k Probabilistic XML Ranking (PXML-Rank): LetTPbe a PXML tree with possible worlds pwd(TP). A PXML-Rank query, (Q, k), is specified by a twig query, Q, and a positive number, k, against TP . It ranks the top-k probabilities for the answers, φi, that satisfy the twig query Q. • Thus, we can get the relationship between PXML-Rank and Global Top-k. Query Ranking in Probabilistic XML Data 21

  22. Motivations Related Work Problem Definition Main Solution Experimental Results Outline Related Work Problem Definition Motivations Experimental Results 3 2 1 5 Main Solution 4 Query Ranking in Probabilistic XML Data 22

  23. Motivations Related Work Problem Definition Main Solution Experimental Results Main Solution (1): The Challenge of PXML-Rank • The method based on dynamic programming under X-Relation Model: Given a decreasing order set of tuples, {t1, t2, …, tN}, and the value of k, the significant task is to compute the probability that N tuple have exactly j tuples in top-j (1<=j<=k). • The biggest challenge for PXML-Rank Query Ranking in Probabilistic XML Data 23

  24. Motivations Related Work Problem Definition Main Solution Experimental Results Main Solution (2): Overview of the solution • Method for computing the top-k probability of φi: • 1) Dynamic programming in local subtree whose root is φi • 2) Computing the j-th probability of φi: Pi,j • 3) Top-k probability of φi: Query Ranking in Probabilistic XML Data 24

  25. Motivations Related Work Problem Definition Main Solution Experimental Results Main Solution (3): An Example • Given a PXML tree, TP, a twig query, Q, and k = 3. We will show how to compute the top-k probability of e6. • Firstly, we can get the decreasing order • result, M= {e1, e2, e3, e4, e5, e6, e7}. And, we • let M(e6)= {e1, e2, e3, e4, e5} • Secondly, we want to compute: • Pr (e6 appears in the top-3 answers) • = Pr (e6 appears and at most 2 answers in • M(e6)) • = Query Ranking in Probabilistic XML Data 25

  26. Motivations Related Work Problem Definition Main Solution Experimental Results Main Solution (3): An Example • According to the shrinking root of current node, we can cute the PXML tree to three parts. • Then, we compute the each situation in • Case 1(j = 0) • First part: 1-Pr(mux1, e1)-Pr(mux1, e2) • = 1-0.4-0.5=0.1 • Second part: e4 and its children cannot • appear, Pr (e4 is absent) = 1- 0.7=0.3 • Thus, for j=0, re6, 0= 0.1 × 0.3 = 0.03 Query Ranking in Probabilistic XML Data 26

  27. Motivations Related Work Problem Definition Main Solution Experimental Results Main Solution (3): An Example • Case 2(j =1) • First part: Pr (e1 or e2 appears) = 0.4+0.5 • Second part: e4 only appears and its • children cannot appear, Pr (e4 only appears) • = 0.7×(1-0.9) ×(1-0.8) =0.014 • Thus, for j=0, re6, 1= 0.9×(1-0.7) + (1-0.9) • × 0.7 = 0.2714 • Case 3(j = 2) • re6, 2= 0.1×0.182 + 0.9×0.014 + 0×0.3 = 0.0308 • Total probability of e6 = Pr(e6 appears) ×(re6, 0+ re6, 1+re6, 2 ) = 0.24 Query Ranking in Probabilistic XML Data 27

  28. Motivations Related Work Problem Definition Main Solution Experimental Results Main Solution (4): Algorithms Query Ranking in Probabilistic XML Data 28

  29. Motivations Related Work Problem Definition Main Solution Experimental Results Main Solution (4): Algorithms Query Ranking in Probabilistic XML Data 29

  30. Motivations Related Work Problem Definition Main Solution Experimental Results Outline Related Work Problem Definition Motivations Experimental Results 3 2 1 5 Main Solution 4 Query Ranking in Probabilistic XML Data 30

  31. Motivations Related Work Problem Definition Main Solution Experimental Results Experimental Results • Datasets: Two real datasets and a synthetic dataset • Real datasets: DBLP and Xmondial • Synthetic dataset: Xmark • Varying the percentage of distribution nodes for DBLP Query Ranking in Probabilistic XML Data 31

  32. Motivations Related Work Problem Definition Main Solution Experimental Results Experimental Results • Varying the number of twig query answers • Varying the k value of Top-k Query Ranking in Probabilistic XML Data 32

More Related