1 / 35

Optimal Top-k Generation of Attribute Combinations based on Ranked Lists

Optimal Top-k Generation of Attribute Combinations based on Ranked Lists. Jiaheng Lu , Renmin University of China Joint work with Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang, and Xinxing Chen. Motivation & Problem Statement.

Download Presentation

Optimal Top-k Generation of Attribute Combinations based on Ranked Lists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimal Top-k Generation of Attribute Combinations based on Ranked Lists Jiaheng Lu, Renmin University of China Joint work with Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang, and Xinxing Chen

  2. Motivation & Problem Statement Goal: Select a combination of three players including forward, center, and guard positions. Game ID Score Limitation: overlook team spirit select players with the highest score in each group Methods calculate the average scores of players across all games … …

  3. Motivation & Problem Statement consider their combined scores in the same game Select the top-k combinations according to top-m aggregate scores Tuple aggregation function Instance aggregation function Top-k,m Problem

  4. Motivation & Problem Statement Top-1,2: select the top-1combination of players according to top-2 aggregate scores for games where they played together. F2C1G1 is the best combination, since (21.51 + 18.76) is the highest overall score.

  5. Difference between top-k queries and top-k,m queries Top-k,m Top-k Return the top-k combinations of attributes Return the top-ktuples Cannot be transformed into a SQL Can be transformed into a SQL

  6. Application XML keyword refinement Example Q = {DB;UC Irvine; 2002} Groups: G1 = {"DB"; "database"}, G2={"UCI";"UC Irvine"} G3 = {"2002"}. Consider a top-1,2 query Answer: Q’={DB, UCI, 2002}

  7. Application (Cont.) Evidence combination mining in medical databases Package recommendation systems …

  8. Outline • Motivation & Problem Statement • Top-k,m Query Processing • Experimental Results • Conclusion

  9. Top-k,m Query Processing Access Model: Sorted Accesses (a, 9.0) (i, 8.8) (b, 8.7) (c, 7.9) (c, 8.7) (a, 7.5) (d, 7.4) (f, 6.9) …… …… (i, 5.3) (d, 4.7)

  10. Top-k,m Query Processing Access Model: Random Accesses (a, 9.0) (i, 8.8) (b, 8.7) (c, 7.9) (c, 8.7) (a, 7.5) (d, 7.4) (f, 6.9) …… …… (i, 5.3) (d, 4.7)

  11. Top-k,m Query Processing Baseline Method: ETA Compute top-m tuples for each combination Calculate aggregate score for each combination Return the top-k combinations Threshold Algorithm (TA)

  12. Top-k,m Query Processing Upper and Lower bounds Algorithm: ULA Lower Bound Upper Bound Consider top-m seen match instances Consider threshold value and top-m match instances Compute the upper and lower bounds for each combination Termination condition: k combinations meet the hit-condition

  13. Upper and Lower bounds Algorithm: ULA Top-1,2 (G5,7.8) (G1, 9.3) (G2, 7.9) (G4,8.0) (G2, 8.3) (G11,7.3) (G1,7.0) (G8,7.3) (G8, 3.0) (G4, 1.8) (G11, 4.2) (G2, 4.4) (G4, 2.6) (G2, 1.5) (G5, 3.3) (G1, 2.3) A1B1

  14. Upper and Lower bounds Algorithm: ULA U: 34.4 A1B1 L: 32.5 (G5,7.8) (G1, 9.3) (G2, 7.9) (G4,8.0) U: 34.6 (G2, 8.3) (G11,7.3) (G1,7.0) (G8,7.3) A1B2 L: 22.2 U: 31.4 A2B1 L: 20.5 (G8, 3.0) (G4, 1.8) (G11, 4.2) (G2, 4.4) U: 31.6 (G4, 2.6) (G2, 1.5) (G5, 3.3) (G1, 2.3) A2B2 L: 9.8 (G1, 9.3+7.0), (G2, 7.9+8.3) L: 32.5 A1B1 Threshold value=7.8+7.9=15.7, 15.7*2=31.4 U: 31.4 A2B1

  15. Upper and Lower bounds Algorithm: ULA U: 32.5 A1B1 L: 32.5 (G5,7.8) (G1, 9.3) (G2, 7.9) (G4,8.0) U: 31.2 (G2, 8.3) (G11,7.3) (G1,7.0) (G8,7.3) A1B2 L: 22.2 (G8, 3.0) (G4, 1.8) (G11, 4.2) (G2, 4.4) (G4, 2.6) (G2, 1.5) (G5, 3.3) (G1, 2.3) A1B1

  16. Can we run fast?

  17. Optimization heuristics (1) Pruning combinations without computing the bounds (A3,B2) is dominated by (A2,B1) 6.3<7.1 and 8.0<8.2

  18. Optimization heuristics (2) Reducing the number of accesses Avoiding both sorted and random accesses for specific lists (A1,B1)and(A1,B2) cannot be part of answers, all sorted accesses and random accesses on list A1 are unnecessary.

  19. Optimization heuristics (3) Reducing the number of accesses Reducing random accesses across two lists (A1,B1,C1)and(A1,B1,C2) cannot be part of answers, random accesses between A1 and B1 are unnecessary.

  20. Optimization heuristics (4) Reducing the number of accesses Eliminating random accesses for specific tuples Random access from Le to Lt for tuple x is useless

  21. Top-k,m Query Processing Compute upper and lower bounds for unterminated combinations Until k combinations meet hit-condition Prune dominated combinations ULA+ Terminate combinations by reducing number of accesses

  22. Interesting theoretical results Optimality properties Instance Optimality • If wild guesses are not allowed, and the size of each group is treated as a constant, then ULA and ULA+ are instance-optimal. for every instance there exist two constants a and b such that cost(A) <= a*cost(A’) + b The upper bound of the optimality ratio is tight

  23. Interesting theoretical results (Cont.) Optimality properties No Instance Optimal Algorithms • If wild guesses are allowed, Then there is no deterministic algorithm that is instance-optimal.

  24. Outline • Motivation & Problem Statement • Top-k,m Query Processing • Experimental Results • Conclusion

  25. Experimental Results Experimental Setup Language: Java; OS: Windows XP;CPU: 2.0GHz; Disk:320GB Data sets

  26. Experimental Results Experimental results on NBA and YQL datasets ULA+ outperforms ETA by 1-2 orders of magnitude both in running time and access number.

  27. Experimental Results Performance of optimization to reduce combinations More than 60% combinations are pruned without computing their bounds

  28. Experimental Results Performance of different optimizations Combination of all optimizations has the most powerful pruning capability.

  29. Experimental Results Experimental results on XML DBLP dataset XULA and XULA+ perform better than XETA and scale well in both running time and number of accesses.

  30. Related Works U. Güntzer etc, VLDB2000 S. Nepal etc, ICDE1999 Fagin etc, PODS 2001 Top-k with both random and sorted accesses R. Fagin etc, JCSS2003 N. Mamoulis etc, TDS2007 Fagin etc, PODS 2001 Top-k with only sorted accesses

  31. Related Works N. Bruno etc, ICDE2002 K. C. C. Chang etc, SIGMOD2002 Top-k with sorted access on restricted lists Top-k with no need for exact aggregate score I. F. Ilya etc, VLDB2002 C. Li etc, SIGMOD2006 M. L. Yiu etc, DKE2008 Ad-hoc top-k queries

  32. Related Works Top-k Package recommendation T. Deng, W, Fan and F. Geerts, On the Complexity of Package Recommendation Problems PODS 2012

  33. Outline • Motivation & Problem Statement • Top-k,m Query Processing • Experimental Results • Conclusion

  34. Conclusion Propose a new problem called top-k,m query evaluation Developed a family of efficient algorithms, including ULA and ULA+ Study the optimality properties of our algorithms Apply top-k,m query to the context of XML keyword query refinement

  35. Optimal Top-k Generation of Attribute Combinations based on Ranked Lists

More Related