1 / 22

An Efficient Algorithm for Discovering Frequent Subgraphs

This paper presents a efficient algorithm for discovering frequent subgraphs in structural pattern datasets, particularly in the domains of biology and chemistry. The algorithm utilizes graph isomorphism, canonical labeling, vertex invariants, and vertex degrees and labels to efficiently compute the frequency of subgraphs. Experimental results demonstrate its effectiveness on real and synthetic datasets.

tbrinkley
Download Presentation

An Efficient Algorithm for Discovering Frequent Subgraphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

  2. b y x a a x Introduction • Structural pattern • Biology, chemistry • Chemical compounds • graph • vertex– item • edge – relation between items • Undirected connected labeled graph

  3. b a x y x x a b a a x y Graph Isomorphism • G1(V1,E1) and G2(V2,E2) are topologically identical to each other. • There is a mapping from v1 to v2,such that each edge in E1 is mapped to E2 and vice versa. v0 v0 = v1 v2 v1 v2

  4. b a y x x x a b a a y x Canonical labeling v1 a v2 a v0 b • Adjacency list v0 v0 v1 v1 v2 v2 code = baaxxy v1 b v2 a v0 a v0 v0 || v1 v1 v2 v2 code = abaxyx

  5. Canonical labeling • Different permutation of vertices lead to different canonical label. • |v|! • Largest codes

  6. b x x a a y Vertex invariants • Properties don’t change across isomorphism mappings. • Vertex degree • Vertex label • siblings

  7. Vertex Degrees and Labels • Adjacency Matrix • Partitioning verteices by degrees and labels that every partition contains vertices with same degree and label

  8. b x x a a y Vertex Degrees and Labels v1 a v2 a v0 b v0 v0 v1 v1 v2 v2 code = baaxxy Degree:p0={v0,v1,v3}:2 Degree+label: p0={ v1,v2}:(2,a),p1={v0}:(2,b)

  9. b x x a a y Vertex Degrees and Labels v2 a v0 b v1 a v1 v0 v2 v1 v2 v0 code = aabyxx p0={ v1,v2}:2,a,p1={v0}:2,b 原本:3! 現在:2!x 1!

  10. Running example minsup =2 0 0 0 0 0 0 1 1 1 1 2 3 3 0 0 0 1 0 2 0 2 0 4 0 1 3 3 Frequent 1_subgraph 1 2 1 1 g0 g1 g2 2 0 0 1 1 0 1 3 1 4 2 2

  11. Running example minsup =2 0 1 0 1 3 0 2 2 1 c0 ,c1 c3 ,c2 c2 ,c3 c1 c3 c2 c0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 2 1 …… 0 3 0 2

  12. 0 1 0 1 3 0 2 2 1 c3,c4 c2 c3 c4 0 c2 c1 0 0 1 0 0 1 0 3 1 1 1 2 1 2 0 3 3 0 Frequent 2_subgraph 2 1

  13. Frequency computing • Id-list • Intersection two k-subgraph’s id-list • Frequent->find the support • Not frequent -> pruned

  14. Candidate generation • Joining two frequent k-subgraph ->k+1 candidate subgraph • Having same k-1 core • Vertex labeling • Multiple cores • Multiple automorphisms

  15. Vertex labeling

  16. Multiple automorphism

  17. Multiple cores

  18. 0 1 0 1 3 0 2 2 1 c3,c4 c2 c1 c4 c2 0 c3 0 0 1 0 1 0 0 3 1 1 1 2 2 1 0 3 3 0 ,q1 q0 q1 2 1 q1 0 0 q0 0 q2 0 1 0 1 0 1 1 0 0 0 1 1 1 1 2 2 1 2 1 2 1 0 2 1 0 0 0 不符合downward closure 不符合downward closure

  19. Experiment • AMD 1.53GHz • 2GB main memory • Linux OS • chemical compound: • PTE(340),66 atom types and four bond types,27 edges/graph on average • DTP(223,644),104 atom types and three bound types and 22 edges/graph on average • Synthetic datasets

  20. PTE and DTP

  21. Synthetic datasets

  22. Synthetic datasets|D|=10000,|S|=200,|LE|=1,minsup=2%

More Related