1 / 17

GRAIL: Scalable Reachability Index for Large Graphs

GRAIL: Scalable Reachability Index for Large Graphs . H. Yıldırım , V. Chaoji , and M. J. Zaki , "GRAIL: a scalable index for reachability queries in very large graphs," The VLDB Journal—The International Journal on Very Large Data Bases, vol. 21, pp. 509-534, 2012.

blaze
Download Presentation

GRAIL: Scalable Reachability Index for Large Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GRAIL: Scalable Reachability Index for Large Graphs H. Yıldırım, V. Chaoji, and M. J. Zaki, "GRAIL: a scalable index for reachability queries in very large graphs," The VLDB Journal—The International Journal on Very Large Data Bases, vol. 21, pp. 509-534, 2012. Some slides are taken from original paper’s presentation Soheila Abrishami

  2. INTRODUCTION • Problem Definition • Given a directed graph and two nodes , a reachability query asks if there exists a path from to in . If can reach , we denote it as • Application • Semantic Web is composed of RDF/OWL data, reachability queries can be used to infer the relationships among the objects. • Network biology, reachability querying plays a role in protein-protein interaction networks and gene networks.

  3. DAG • The problem of reachability on directed graphs can be reduced to reachability on directed acyclic graphs (DAGs) • DAG • Each node represents a strongly connected component of original graph, each edge represents whether one component can reach another.

  4. DAG… a: A directed graph. b: Corresponding DAG

  5. DAG… • Answer whether node can reach in G • look up their corresponding strongly connected components, and respectively, which are the nodes in G′. If = , then and reach , If != , then we pose the question whether can reach in G′. • Reachability queries on the original graph can be answered on the DAG, and thus we will discuss methods for reachability only on DAGs.

  6. Trade-off between Query Time and Index Size

  7. Interval labeling • Reachability problem on trees can be solved effectively by interval labeling ; linear time and space for constructing the index, and provides constant time querying. • Definition • each node with a range = [, ] • A reachability query can be answered by comparing the corresponding intervals, i.e., can reach if and only if

  8. Interval labeling… • By considering the min-post-labeling method for trees • : denotes the rank of the node in a post-order traversal of the tree • =min • This figure shows the min-post-labeling for an example tree. • For example, 9, since , but 7, since

  9. Ensure that a node is not visited more than once, and a node will keep the post-order rank of its first visit. Interval containment of nodes in a DAG is not exactly equivalent to reachability. For example, .(false positive or exceptions) Generalize the interval labeling to a DAG

  10. The GRAIL Approach • GRAIL uses min-post-labeling directly on the directed acyclic graph • GRAIL employs multiple min-post-intervals that are obtained via randomgraph traversals. symbol d denote the number of intervals to keep per node • The key idea is to do fast elimination for pairs of query nodes for whom non-reachability can be determined via the intervals.

  11. The GRAIL Approach • Definition • In GRAIL, for a given node u, the new label is given as , where is the interval label obtained from the (random) traversal of the DAG, and , where d is dimension or number of intervals. We say that is contained in , denoted as , if and only if for all ,then we can conclude that v, as per the proposed lemma. • Two main issues in GRAIL • i) how to compute the d random interval labels while indexing • ii) how to deal with exceptions, while querying

  12. Index Construction • The index construction step in GRAIL is very straightforward; the desired number of post-order interval labels are generated by simply changing the visitation order of the children randomly during each depth-first traversal. • The best strategy is to cease labeling after a small number of dimensions (such as 5), with reduced exceptions, rather than trying to totally eliminate all exceptions

  13. Reachability Queries • To answer reachability queries between two nodes, u and v, GRAIL adopts a two-pronged approach. GRAIL first checks whether . If so, we can immediately conclude that , by lemma1. On the other hand, if , nothing can be concluded immediately since we know that the index can have false positives, i.e., exceptions. • Keeping explicit exception lists per node does not scale to very large graphs. Default approach in GRAIL is to use a “smart” DFS, with recursive containment check based pruning, to answer queries. • Querying takes time if and the worst case complexity is for the DFS

  14. Experiments • Large graphs: 700K to 25M nodes with degrees 0.5 to 5 • Only GRAIL and GRIPP scale on these datasets • While GRIPP’s index size is smaller than GRAIL’s, its construction time can be up to 40 times slower than GRAIL

  15. Experiments… • GRAIL outperforms GRIPP by orders of magnitude • GRAIL is faster than BFS-L (the best among the search-based methods) by 3–40 times on the denser graph Query times (ms)

  16. Conclusion • GRAIL : a lightweight indexing scheme • Easy to implement and scalable • Based on interval labeling • Able to index very large and dense graphs on which existing methods fail

  17. Thank you! Questions?

More Related