1 / 20

Finding Skyline Nodes in Large Networks

Finding Skyline Nodes in Large Networks. Arijit Khan * Vishwakarma Singh * Jian Wu # *Computer Science, University of California, Santa Barbara, USA # College of Computer Science, Zhejiang University, China { arijitkhan , vsingh }@ cs.ucsb.edu , wujian2000@zju.edu.cn.

geordi
Download Presentation

Finding Skyline Nodes in Large Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Skyline Nodes in Large Networks Arijit Khan* Vishwakarma Singh* Jian Wu# *Computer Science, University of California, Santa Barbara, USA #College of Computer Science, Zhejiang University, China {arijitkhan, vsingh}@cs.ucsb.edu, wujian2000@zju.edu.cn

  2. Motivation Query in LinkedIn Network: If John is interested in Big Data, Cloud Computing, and Map Reduce, who will be the top-5 people John should ask about these topics? • Evaluation Metrics: • Distance from the query node. (John) • Coverage of the Query Topics. (Big Data, Cloud Computing, Map Reduce) Finding Skyline Nodes in Large Networks 2

  3. Homogeneous Approach ? Query in LinkedIn Network: If John is interested in Big Data, Cloud Computing, and Map Reduce, who will be the top-5 people John should ask about these topics? Score = λ . Distance + (1- λ ). Coverage How to get λ ? Finding Skyline Nodes in Large Networks 3

  4. Weighted Set Cover ? • Find nodes with smallest aggregate distance from the query node, such that they cover all query topics. u0 = q Q = { a, b, c } • Ignore some interesting nodes. • Cannot rank the results. a b c u1 u2 u3 abc a cd u5 u4 u6 abc de u7 u8 Finding Skyline Nodes in Large Networks 4

  5. Graph Skyline • Dominance on Coverage: u >c v • Query topics covered by node u is a superset of the query topics covered by node v. • Dominance on Distance: u >d v • Distance of u from q is less than that of v from q. • Dominance: u > v • (1) u >c v and u ≥d v ; • or (2) u ≥c v and u >d v. u0 = q Q = { a, b, c } a b c u1 u2 u3 abc a cd u5 u4 u6 abc de u7 u8 Graph Skyline: A node is a skyline node if it is not dominated by any other node in the network. Finding Skyline Nodes in Large Networks 5

  6. Ranking of Skyline Nodes • Too many skyline nodes. • Rank them. u0 = q Q = { a, b, c } • Dominance Count: # nodes dominated by a skyline node. [Lin et. al., ICDE ‘07] • Higher Dominance Count => more pruning from candidate set. a b c u1 u2 u3 abc a cd u5 u4 u6 • 1. DC(u4) = {u5, u6, u7}, • 2. DC(u1) = {u5} • 3. DC(u2) = Φ; 4. DC(u3) = Φ abc de u7 u8 Problem Statement: Given a query node and a set of query topics in a network, find the top-k skyline nodes with maximum dominance count. Finding Skyline Nodes in Large Networks 6

  7. Algorithm • Construct a Query DAG. • Three variables associated with each DAG node: Count (C), Dominance • (D), Traversal (T). u0 = q • Naïve Complexity: O(n2r) • Complexity with • Preprocessing: O(nr2) Q = { a, b, c } C = 2 D = - T = - abc a b c C = 0 D = - T = - ab ac bc C = 0 D = - T = - u1 u2 u3 C = 0 D = - T = - abc a cd u5 u4 u6 C = 1 D = - T = - C = 2 D = - T = - C = 2 D = - T = - abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 7

  8. Query DAG Construction • Preprocessing: For each label, find a sorted list of nodes that contain the label. • Online Query DAG Construction: Incremental DAG construction. u0 = q Q = { a, b, c } u4 u7 u3 u4 u6 u7 a b c c ab u1 u2 u3 abc a cd a b u5 u4 u6 abc de u1 u5 u2 u7 u8 Finding Skyline Nodes in Large Networks 8

  9. Query DAG Construction (cont.) • Preprocessing: For each label, find a sorted list of nodes that contains the label. • Online Query DAG Construction: Consider the labels and their sorted lists in order. u0 = q abc Q = { a, b, c } u4 u7 a b c ab u1 u2 u3 abc a cd a b c u5 u4 u6 abc de u1 u5 u2 u3 u6 u7 u8 Finding Skyline Nodes in Large Networks 9

  10. Query DAG Construction (cont.) • Preprocessing: For each label, find a sorted list of nodes that contains the label. • Online Query DAG Construction: Consider the labels and their sorted lists in order. u0 = q abc Q = { a, b, c } u4 u7 a b c bc ab ac u1 u2 u3 abc a cd a b c u5 u4 u6 abc de u1 u5 u2 u3 u6 u7 u8 Finding Skyline Nodes in Large Networks 10

  11. Find Dominance Variable • Perform a topological ordering of the DAG nodes to evaluate the Dominance variable (D) of each DAG node. • # Nodes dominated (or equal) by coverage. u0 = q • Naïve Complexity: O(n2r) • Complexity by • Topological Ordering: O(3r) Q = { a, b, c } C = 2 D = 7 T = - abc a b c C = 0 D = 4 T = - ab ac bc C = 0 D = 3 T = - u1 u2 u3 C = 0 D = 3 T = - abc a cd u5 u4 u6 C = 1 D = 1 T = - C = 2 D = 2 T = - C = 2 D = 2 T = - abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 11

  12. Find Traversal Variable • Perform a Breadth First Search (BFS) starting from the query node. • # Nodes not dominated by distance. u0 = q C = 2 D = 7 T = 1 • Complexity by BFS: O(n+e) Q = { a, b, c } abc a b c C = 0 D = 4 T = 0 ab ac bc C = 0 D = 3 T = 0 u1 u2 u3 C = 0 D = 3 T = 0 abc a cd u4 u6 u5 h =2 C = 1 D = 1 T = 1 C = 2 D = 2 T = 2 C = 2 D = 2 T = 2 abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 12

  13. Find Skyline Nodes • Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node. • Helps to prune non-skyline nodes directly. u0 = q Q = { a, b, c } abc a b c a b c ab ac bc u1 u2 u3 h =1 abc a cd u4 u5 u6 abc de a b c u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 13

  14. Find Skyline Nodes (cont.) • Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node. • Helps to prune non-skyline nodes directly. u0 = q Q = { a, b, c } abc a b c ab ac bc u1 u2 u3 abc a cd u4 u5 u6 abc de a b c h =2 u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 14

  15. Dominance Count of Skyline Nodes • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. C = 2 D = 7 T = 0 u0 = q Q = { a, b, c } abc a b c ab ac bc C = 0 D = 4 T = 0 u1 u2 u3 C = 0 D = 3 T = 0 C = 0 D = 3 T = 0 abc a cd u4 u5 u6 h =2 C = 2 D = 2 T = 1 C = 1 D = 1 T = 1 abc de a b c C = 2 D = 2 T = 1 u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 15

  16. Pruning and Early Termination • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • Top-k Pruning: Dominance Variable of a DAG node has smaller value than the smallest Dominance Count in the top-k buffer. • Early Termination: Skyline Bits of all entries in the Lookup Table are 1’s. Finding Skyline Nodes in Large Networks 16

  17. Experimental Results • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • DBLP: 0.7M Nodes, 3M Edges, 10 Node Labels (distinct). • 5 Query Topics. Finding Skyline Nodes in Large Networks 17

  18. Efficiency • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • DBLP: 185M Nodes, 90M Edges, 1000 Node Labels (distinct). • 5 Query Topics, Top-5 Result Nodes. Finding Skyline Nodes in Large Networks 18

  19. Conclusion and Future Works • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • Efficient Algorithm to find top-k skyline nodes in large attributed network. • Required experimental evaluation in real and synthetic datasets. • Time Complexity is linear in the number of nodes and edges in the network. Distance based indexing might improve the efficiency. • Top-k Skyline set instead of Top-k Skyline nodes might be more effective. Finding Skyline Nodes in Large Networks 19

  20. Questions • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. Thank You ! ! ! Finding Skyline Nodes in Large Networks 20

More Related