1 / 28

Mehdi Kargar Aijun An York University, Toronto, Canada

Keyword Search in Graphs: Finding r-cliques. Mehdi Kargar Aijun An York University, Toronto, Canada. VLDB’11 Keyword Search in Graphs: Finding r-cliques. Overview. Keyword Search in Graphs/Relational Databases r-clique Definition Challenges in Finding r-clique

ssaliba
Download Presentation

Mehdi Kargar Aijun An York University, Toronto, Canada

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Keyword Search in Graphs: Finding r-cliques Mehdi Kargar Aijun An York University, Toronto, Canada

  2. VLDB’11Keyword Search in Graphs: Finding r-cliques Overview • Keyword Search in Graphs/Relational Databases • r-clique Definition • Challenges in Finding r-clique • Approximation Algorithm for Finding r-cliques • Enumerating Top-k r-cliques in Polynomial Delay • Empirical Results • Conclusion

  3. VLDB’11Keyword Search in Graphs: Finding r-cliques Keyword Search in Graphs/Relational Databases • Keyword search is a well known mechanism for retrieving relevant information from a set of documents. • Google is a familiar example ! • What about structured data? • Such as XMLdocuments or Relational Databases? • Current enterprise search engines in structured data requires: • Knowledge of schema • Knowledgeof a query language • Knowledgeof the role of the keywords • Do users have all of the above Knowledge ? • The answer is NO !

  4. VLDB’11Keyword Search in Graphs: Finding r-cliques Keyword Search in Graphs/Relational Databases • Users need a simple system that receives some keywords as input and returns a set of nodes that together cover all or part of the input keywords as output. • Relational databases can be modeled using graphs: • Tuples are nodesof the graph. • Foreign key relationships are edgesthat connect two nodes (tuples) to each other.

  5. Example: Search in Relational Databases VLDB’11Keyword Search in Graphs: Finding r-cliques Cities Organizations Countries Memberships Part of Mondial Dataset 5/28

  6. New York is Located in United States VLDB’11Keyword Search in Graphs: Finding r-cliques Cities Organizations Countries Memberships Keywords: “New York” “United States” 6/28

  7. New York hosts UN and Canada is a member VLDB’11Keyword Search in Graphs: Finding r-cliques Cities Organizations Countries Memberships Keywords: “New York” “Canada” 7/28

  8. VLDB’11Keyword Search in Graphs: Finding r-cliques Previous Approaches • Most of the works find minimal connected treesthat contain all or part of the input keywords. • The tree is called Steiner Tree. • Recently, methods that produce sub-graphs are proposed. They might provide more informative answers • One of the recent approaches is called multi-center community (ICDE 2009). • So, what is the problemwith previous approaches?

  9. VLDB’11Keyword Search in Graphs: Finding r-cliques Problems with Previous Approaches • There might be some content nodes that are far away from each other. • It means that weak relationships among content nodes might exist. • There is no guarantee on the closeness of the nodes. • Since all keywords are equally important, all of them should be close to each other. They are also equally important in the ranking function. • While searching for the answers, current methods explore both content and non-content nodes. • This might lead to poor performance.

  10. VLDB’11Keyword Search in Graphs: Finding r-cliques r-cliques • To solve the problem of previous approaches, we propose to find r-cliques. • An r-clique is a set of content nodes that together contain all of the input keywords and in which the shortest distance between each pair of nodes is no longer than r. • Weight of r-clique: Suppose that the nodes of an r-clique are denoted as {v1, v2, … , vn}. The weight of the r-clique is defined as: • dist(vi,vj) is the shortest distance between vi and vj .

  11. VLDB’11Keyword Search in Graphs: Finding r-cliques Benefits of Finding r-cliques • Finding r-cliques as the answers for keyword search in graphs does not have the problems of previous approaches. • All of the content nodes are reasonably close to each other. • The weight function evaluates all of the content nodes equally. • The algorithm (to be discussed later) for finding r-cliques concentrate on the content nodes rather than all of the nodes in the graph. So, it is faster and more efficient. • For presenting the relationships, the final answer has less irrelevant nodes than a multi-center community.

  12. VLDB’11Keyword Search in Graphs: Finding r-cliques An Example Input Keyword: JamesJohnJack 12/28

  13. VLDB’11Keyword Search in Graphs: Finding r-cliques r-clique weight: 12 tree weight: 8 community weight: 8 r-clique weight: 14 tree weight: 7 community weight: 7 13/28

  14. VLDB’11Keyword Search in Graphs: Finding r-cliques Challenges in Finding r-cliques • Problem1: Given a distance threshold r, a graph G and a set of input keywords, find an r-clique in G whose weight is minimum. • Theorem: Problem 1 is NP-hard. • Proved in the paper by reduction from 3-satisfiability (3-SAT). • Solution : Approximation algorithm with guaranteed ratio. • Total number of answers is exponential regarding the number of input keywords. • It is not efficient to generate all answers and then sort them. • Solution : Enumerating answers in polynomial delay. 14/28

  15. VLDB’11Keyword Search in Graphs: Finding r-cliques What We Need … • Producing r-cliques in a ranking order • r-cliques with lower weights should be presented before ones with higher weights. • Producing top-k r-cliques efficiently with a bound on approximation ratio • Each r-clique must be generated efficiently in polynomial time. • There must be a bound on the quality of a generated r-clique • The weight of a generated r-clique should be within some factor of the current optimal solution • Generating all the r-cliques if needed • No r-clique should be missed 15/28

  16. VLDB’11Keyword Search in Graphs: Finding r-cliques Heuristic and Approximate Order Heuristic Order Approximate Order It is expected to be close to the optimal answer. But, we have no guarantee It is close to the optimal answer with a provable guarantee Desired Choice

  17. VLDB’11Keyword Search in Graphs: Finding r-cliques Enumerating in Approximate Order • The Lawler’s technique is used for finding the top-k answers. • In each iteration, the next r-clique is generated by finding the top answer under constraints. • Two problems should be solved 1- What are the constraints? 2- How top answer can be found efficientlyunder the constraints? 17/28

  18. VLDB’11Keyword Search in Graphs: Finding r-cliques Overview of the System Input Keywords + Value of k Find best Answer with no Constraint Insert the best r-clique with the search space in priority queue Fetch the best r-clique from priority queue and printit Top-kalready printed OR Emptypriority queue ? Terminate YES NO Insert each answer with the related search space into priority queue Find best r-clique in each sub-space with associated constrains Dividethe related search space of the top answer into sub-spaces 18/28

  19. VLDB’11Keyword Search in Graphs: Finding r-cliques Constraints and Search Space • Let’s do it using an example ! • Suppose that the input keywords are {k1, k2, k3, k4}. • Ci = {set of nodes that contains keyword ki }. • The search space that contains the best r-clique can be represented as {C1 ᵡC2 ᵡC3 ᵡ C4}. • Assume that the best r-clique is (v1, v2, v3, v4), where vi is a node containing keyword ki . The whole search space 19/28

  20. VLDB’11Keyword Search in Graphs: Finding r-cliques Finding BestApproximate r-clique • Step 1: for all content nodes nin the search space, for all keywords ki, find the closest node in the search space which contains ki. • Step 2: for all content nodes n, for all keywords ki, calculate the sum of distances from nto the holder of ki. • Step 3: Find the content node with the minimumsum of distances among other content nodes. • Step 4: Return the set of content nodes with the minimum sum of distances. 20/28

  21. VLDB’11Keyword Search in Graphs: Finding r-cliques Properties of the Approximation Algorithm • Only content nodes are searched for finding the best answer in the search space. • The approximation ratio of the algorithm is equal to 2. • The weight of the answer is at most twice of the weight of the optimal answer. • Proof can be found in the paper.

  22. VLDB’11Keyword Search in Graphs: Finding r-cliques Presenting r-cliques to the User • To show the relationshipbetween the nodes in an r-clique, a Steiner tree is found and presented to the user. Keywords: (in DBLP dataset) “Parallel” “Algorithm” “Optimization” “Graph” Distributed ParallelAlgorithm For Nonlinear Optimization Without Derivatives Distributed ParallelAlgorithm For Nonlinear Optimization Without Derivatives A Binding Number Computation of Graph w w w w w w w r-clique Xuping Zhang Xuping Zhang Guoping He Congying Han community w w Irrelevant A Binding Number Computation of Graph A New Non-interior Continuation Method for Second-Order Cone Programming

  23. VLDB’11Keyword Search in Graphs: Finding r-cliques Experimental Results • The r-clique is compared with the multi-center community method (it is called com-k). • Our approximation algorithm is called poly-delay-k. • Two datasets are used: DBLP and IMDb. • The set of input keywords and parameters are the same as the community paper.

  24. VLDB’11Keyword Search in Graphs: Finding r-cliques Running Time DBLP Dataset IMDb Dataset

  25. VLDB’11Keyword Search in Graphs: Finding r-cliques Quality of the Answers DBLP Dataset

  26. VLDB’11Keyword Search in Graphs: Finding r-cliques Search Accuracy from a User Study • Top-k precision: the percentage of the answers in the top-k answers that are relevant to the query. • The users are asked to evaluate the answers using two methods. • In the first approach the scores (0-1) are assigned to the nodes. Then, the average is used as the precision. • In the second approach, the whole answer is evaluated and a score is assigned to it. • The results of both of the methods are similar. DBLP Dataset

  27. VLDB’11Keyword Search in Graphs: Finding r-cliques Conclusion • A novel and efficient approach for keyword search in graphs has been proposed. • All of the content nodes are reasonably close to each other. • An approximation algorithm with bounded guarantee has been proposed. • Only content nodes are explored during the search process. • A Steiner tree which has as small as possible number of middle nodes has been generated to reveal relations among content nodes.

  28. VLDB’11Keyword Search in Graphs: Finding r-cliques VLDB’11Keyword Search in Graphs: Finding r-cliques Thank you! Any Questions?

More Related