1 / 56

Graph Indexing: A Frequent Structure-­based Approach

Graph Indexing: A Frequent Structure-­based Approach. 指導老師:曾新 穆 教授 組員 :李彥寬、洪世敏、丁 鏘 巽、 黃冠霖、詹 博 丞 日期: 2013/11/14. Outline. Ch1 Introduction Ch2 Preliminaries Ch3 Frequent Fragment Ch4 Discriminative Fragment Ch5 gIndex Ch6 Experimental Result Improvement Maintenance.

monita
Download Presentation

Graph Indexing: A Frequent Structure-­based Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期:2013/11/14

  2. Outline • Ch1 Introduction • Ch2 Preliminaries • Ch3 Frequent Fragment • Ch4 Discriminative Fragment • Ch5 gIndex • Ch6 Experimental Result • Improvement • Maintenance

  3. Ch1 Introduction

  4. Ch1 Introduction • The classical graph query problem • Given a graph database and a graph query , find all the graphs in which is a subgraph.

  5. Ch1 Introduction • Build graph index • Path-based indexis inefficient. Too many paths

  6. Ch1 Introduction • Build graph index • Graph-based index is suitable. Only one result

  7. Ch2 Preliminaries

  8. Ch2 Preliminaries • The graph feature set is denoted by .For any graph feature , is the set of graphscontaining , .

  9. Ch2 Preliminaries • Query processing, which consists of two substeps: • (1)Search. compute the candidate query answer set, ; each graph in contains all ‘s features in the feature set. Therefore, is a subset of . • (2) Verification, which checksgraph g in to verify whether is really a subgraphof .

  10. Ch2 Preliminaries • Cost Analysis • Query Response Time: • the index size is approximately proportional to the size of the feature set .

  11. Ch3 Frequent Fragment

  12. Ch3 Frequent Fragment minSup: 2 indexed

  13. Ch3 Frequent Fragment If query Q is frequent, We can easily find Q indexed

  14. Ch3 Frequent Fragment If query Q is not frequent?

  15. Ch3 Frequent Fragment Find the frequent subgraphs of Q!

  16. Ch3 Frequent Fragment We find all q’s subgraphs Sort them in the support decreasing order There is a boundary that

  17. Ch3 Frequent Fragment We find all q’s subgraphs Sort them in the support decreasing order There is a boundary that

  18. Ch3 Frequent Fragment Conpute candidate answer set by

  19. Ch3 Frequent Fragment Conpute candidate answer set by

  20. Ch3 Frequent Fragment Conpute candidate answer set by

  21. Ch3 Frequent Fragment If minSup high less may be too large! If minSuplow too many Low High

  22. Ch3 Frequent Fragment Advantages of : (1) Less frequent fragment than lowest uniform (2) There are so many small subgraphs. Low-support large fragment may be indexed by them. But the smaller subgraphs may be too large because is low! We will design a distillation procedure in the next section!

  23. Ch3 Frequent Fragment Typical setting , It continues until exhaust fragments up to size of maxL with

  24. Ch4 Discriminative Fragment

  25. Ch4 Discriminative Fragment • Do we need to index every frequent fragment? • If there are two frequent fragment that is a supergraph of . is not more discriminative than is redundant.

  26. Ch4 Discriminative Fragment We use discriminative ratio to find if x is discriminative.

  27. Ch4 Discriminative Fragment Property of When x is completely redundant. When,x is more discriminative. is related to the fragments which are already in the feature set. So, we need to mine discriminative fragments.

  28. Ch4 Discriminative Fragment If we set , We get Discriminative Fragments above. Since fragment (b) is a sub-graph of fragment (c), its discriminative ratio of fragment (c) is 2 / 1 = 2.0.

  29. Ch5 gIndex

  30. Ch5 gIndex 5.1 Discriminative fragment selection 5.2 Index construction 5.3 Search

  31. 5.1 Discriminative fragment selection

  32. 5.2 Index construction 5.2.1 Graph Sequentialization 5.2.2 gIndexTree 5.2.3 Remark on gIndex Tree Size 5.2.4 gIndexTree Implementation

  33. 5.2.1 Graph Sequentialization • Adjacency matrices • DFS code Discovery time : Forward edge : Backward edge : DFS code : 5-tuple :

  34. 5.2.2 gIndex Tree Root : Level 0 : graphs with only one vertex and no edge …

  35. 5.2.3 Remark on gIndex Tree Size 0 Discriminative features on one path: 1 K-1 2 …

  36. 5.2.4 gIndexTree Implementation • Using Hash table • If two graph and are isomorphicthen

  37. 5.3 Search 5.3.1 Apriori Pruning 5.3.2 Maximum Discriminative Fragments

  38. 5.3.1 AprioriPruning • If a fragment is not in thegIndextree, we need not check its super-graphs any more. • A hash table H is used to facilitate the Apriori pruning.

  39. 5.3.2 Maximum Discriminative Fragments • If , then

  40. Ch6 Experimental Result

  41. Experimental Result • The performance of gIndex is compared with that of GraphGrep • GraphGrep is a path-based approach • two kinds of datasets in the experiments • one real dataset • a series of synthetic datasets

  42. Dataset • The real dataset is that of an AIDS antiviral(抗病毒藥物) screen dataset containing chemical compounds • the dataset contains 43,905 classified chemicalmolecules • The synthetic data generator was provided by Kuramochiet al. • allows the user tospecifythe number of graphs (D), their average size(T), the number of seed graphs (S), the average size of seed graphs (I), and the number of distinct labels(L)

  43. Experiment Background • experiments are performed on a 1.5GHZ, 1GB-memory, Intel PC running RedHat8.0 • Both GraphGrep and gIndex are compiled with gcc/g++

  44. AIDS Antiviral Screen Dataset

  45. Experimental Result the index size of gIndex is at least 10 times smaller than that of GraphGrep two salient properties of gIndex: its index size is small and stable

  46. Experimental Result • the size of candidate answer set Cq : | Cq | • AVG(|Dq|) : the lower bound of AVG(|Cq|) • An algorithm achieving this lower bound actually matches the queries in the graph dataset precisely

  47. Experimental Result Q4 (Query answer set size 較少) queries in Q4 are more likely path-structured

  48. Experimental Result (Query answer set size 較多)

  49. Experimental Result

  50. Experimental Result The scalability of gIndex

More Related