1 / 13

Towards Graph Containment Search and Indexing

Towards Graph Containment Search and Indexing. Chen Chen 1 , Xifeng Yan 2 , Philip S. Yu 2 , Jiawei Han 1 , Dong-Qing Zhang 3 , Xiaohui Gu 2 1 University of Illinois at Urbana-Champaign 2 IBM T.J. Watson Research Center 3 Thomson Research

Download Presentation

Towards Graph Containment Search and Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip S. Yu 2, Jiawei Han 1, Dong-Qing Zhang 3, Xiaohui Gu 2 1 University of Illinois at Urbana-Champaign 2IBM T.J. Watson Research Center 3Thomson Research VLDB ’07, September 23-28, 2007, Vienna, Austria 2007. 12. 28 Summarized by Dongjoo Lee, IDS Lab., Seoul National University Presented by Dongjoo Lee, IDS Lab., Seoul National University

  2. Contents • Graph Search • cIndex Basic Framework • Indexing Features • Indexing Model • cIndex-Basic • Hierarchical Indexing Models • cIndex-BottomUp, cIndex-TopDown • Index Maintenances • Experiments • Discussion

  3. Graph Search SCAN Answers of Graph Containment Search Answers of Graph Search Sample Database Sample Query q gc Subgraph q Supergraph gc q Subgraph isomorphism • Traditional Graph Search • Find all supergraphs of a query graph • Graph Containment Search • Find all subgraphs of a query graph

  4. Subgraph Indexing & Pruning f1 q1 ga q2 f2 gb q3 f3 gc SCAN f4 Queries Sample Database Indexed Subgraphs Feature-Graph Matrix • Traditional Graph Search - Inclusion Logic Given a query graph q and a database graph g  D, if a feature f  q and f  g, then q  g . That is, if feature f is in q then the graphs not having f are pruned. • Graph Containment Search - Exclusion Logic If a feature f  q and f  g, then g  q. That is, if feature f is not in q then the graphs having f are pruned.

  5. Basic Search Framework Search Time  Let’s reduce these • Off-line index construction • Generate and select a feature set F from the graph database D • f ∈ F, Df = {g | f ⊆ g, g ∈D} • Search • Test indexed features in F against the query q which returns all f q, and compute the candidate query answer set, Cq = D − fDf (f  q, f ∈ F). • Verification • Check each graph g in the database set Cq to see whether g is really a subgraph of q

  6. Redundancy-Aware Feature Selection X X q1 q2 X X X X X q3 f1 f4 f2 f3 • Select minimal features that cover many graphs that other features didn’t cover from frequent features • Expected number of reduced subgraph isomorphism tests • Jf = np(1-p’) -1 • n : |D| • p : frequency of f in D • p’ : probability that a query graph having f • p  p’=> Maximum at p = ½ • Frequent subgraph mining algorithm • FSG, GASTON, gSpan

  7. Maximum Coverage With Cost Definition 6. (Maximum Coverage With Cost). Given a set of subsets S = {S1, S2, …, Sm} of the universal set U = {1, 2, … , n} and a cost parameter  associated with any Si ∈ S, find a subset T of S such that |∪Si∈TSi|−|T| is maximized. U = {1, 2, 3, 4, 5, 6, 7, 8, 9} S = {{4,5,6}, {1,2,4,5},{4,5,7,8}, {1,4,7}} f1 f2 f3 f4  = 3 Set Cover Problem http://en.wikipedia.org/wiki/Set_cover_problem 1 2 3 4 5 6 7 8 9 This is NP-complete problem Contrast Graph Matrix The greedy feature selection process can approximate the optimal index with K features within a ratio of 1 − 1/e. [D. S. Hochbaum, editor. “Approximation Algorithms for NP-Hard Problems”, 1997]

  8. cIndex-Basic 1 2 3 4 5 6 7 8 9 • Time complexity : O(|F0||D||L|) ----> O(|F0||Ds||L|) • Space usage : O(|F0||D||L|) ----> O(|F0||D| + |F0||L|) Reduced by sampling Reduced by virtualization

  9. Hierarchical Indexing Models cIndex-BottomUp cIndex-TopDown

  10. Index Maintenance • “ostrich” strategy • Stick with the same set of selected features and the same hierarchical structure built • Sampling strategy • Periodically take small samples (Fs, Ds, Ls) and construct sample index Is • if performance of legacy index I is much worse than Is then reconstruct index • else take ostrich strategy

  11. Experiments (1) • Chemical Descriptor Search • a model graph database usually includes a set of fundamental substructures, called descriptors. These descriptors, shared by specific groups of known molecules, often indicate particular chemical and physical properties. Given a molecule, fast searching for its “descriptor” substructures can help researchers to quickly predict its attributes. • Comparison • SCAN • FB • Extract features using gIndex and apply features as Basic Framework • cIndex-Basic, cIndex-BottomUp, cIndex-TopDown

  12. Experiments (2) Subgraph Isomorphism Test Numbers Query Processing Time Index Maintenance Effectiveness of Data Space Reduction

  13. Experiments (3) Performance of Hierarchical Indices Scalability of Hierarchical Indices Index Size

More Related