Mining frequent subgraphs
This presentation is the property of its rightful owner.
Sponsored Links
1 / 13

Mining Frequent Subgraphs PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on
  • Presentation posted in: General

Mining Frequent Subgraphs. COMP 790-90 Seminar Spring 2011. FFSM: Fast Frequent Subgraph Mining -- An Overview:. How to solve graph isomorphism problem? A Novel Graph Canonical Form: CAM How to tackle subgraph isomorphism problem (NP-complete)? Incrementally maintained embeddings

Download Presentation

Mining Frequent Subgraphs

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Mining frequent subgraphs

Mining Frequent Subgraphs

COMP 790-90 Seminar

Spring 2011


Ffsm fast frequent subgraph mining an overview

FFSM: Fast Frequent Subgraph Mining -- An Overview:

  • How to solve graph isomorphism problem?

    • A Novel Graph Canonical Form: CAM

  • How to tackle subgraph isomorphism problem (NP-complete)?

    • Incrementally maintained embeddings

  • How to enumerate subgraphs:

    • An Efficient Data Structure: CAM Tree

    • Two Operations: CAM-join, CAM-extension.


Adjacency matrix

a

b

y

b

x

b

y

x

b

y

0

d

0

y

c

0

y

0

c

0

0

0

y

0

d

y

y

0

0

a

M3

M1

p2

p5

a

y

c

b

y

y

b

p1

y

x

b

x

a

y

0

0

y

d

y

d

b

0

y

0

c

0

p4

p3

M2

(P)

Adjacency Matrix

  • Every diagonal entry of adjacency matrix M corresponds to a distinct vertex in G and is filled with the label of this vertex.

  • Every off-diagonal entry in the lower triangle

    part of M1 corresponds to a pair of vertices in G and is filled with the label of the edge between the two vertices and zero if there is no edge.

1for an undirected graph, the upper triangle is always a mirror of the lower triangle


Mining frequent subgraphs

b

x

b

y

0

d

0

y

0

c

y

y

0

0

a

a

y

b

M3

y

x

b

0

y

c

0

0

0

y

0

d

M1

Code

  • A Code of n  n adjacency matrix M is defined as sequence of lower triangular entries (including the diagonal entries) in the order:

    M1,1 M2,1 M2,2 … Mn,1 Mn,2 …Mn,n-1 Mn,n

Code(M1): aybyxb0y0c00y0d < Code(M2): aybyxb00yd0y00c < Code(M3): bxby0d0y0cyy00a

a

y

b

y

x

b

0

0

y

d

0

y

0

c

0

M2

  • TheCanonical Adjacency Matrix is the one produces the maximal code, using lexicographic order.


Mp submatrix

a

a

M1

a

y

b

y

b

y

x

b

a

a

0

y

c

y

x

b

y

b

y

b

a

0

0

y

c

0

0

y

0

d

y

0

b

y

x

b

y

b

0

M5

M2

M3

M4

M6

MP Submatrix

  • For an m  m matrix A, an n  n matrix B is A’s maximal proper submatrix (MP Submatrix), iff B is obtained by removing the last none-zero entry from A.

  • We define a CAM is connected iff the corresponding graph is connected.

  • Theorem I: A CAM’s MP submatrix is CAM

  • Theorem II: A connected CAM’s MP submatrix is connected


Cam tree subgraphs

b

b

a

y

d

x

b

y

b

y

0

c

a

a

a

y

b

y

b

a

a

a

a

a

a

b

y

b

0

y

d

y

0

b

y

y

y

y

b

b

b

b

y

y

b

b

x

b

y

x

b

y

0

c

0

y

y

y

x

x

x

0

b

b

b

b

y

0

x

0

b

b

0

y

0

d

0

0

0

0

y

y

y

y

0

0

0

0

c

d

c

c

0

0

y

y

0

0

d

d

a

a

y

b

y

b

0

x

b

0

x

b

p2

p5

y

0

0

y

c

0

0

y

d

c

b

y

p1

a

a

a

x

a

y

y

y

b

b

b

y

0

0

y

x

x

0

b

b

b

y

d

b

0

0

0

y

y

y

0

0

0

c

c

d

p4

0

0

0

0

0

0

y

y

y

0

0

0

d

c

d

p3

(P)

CAM Tree: Subgraphs

b

d

c

a

b

b

y

c

x

b

a

b

a

a

y

y

b

b

y

b

x

b

0

y

c

y

0

d

0

0

x

x

b

b


Cam tree frequent subgraphs

a

b

a

b

y

b

x

b

a

a

y

b

y

b

0

x

b

y

0

b

a

y

b

y

x

b

p2

p5

s1

q1

y

c

b

y

y

y

b

b

s2

p1

q2

x

a

a

a

x

y

y

y

y

d

b

b

b

p4

s3

q3

p3

(S)

(P)

(Q)

CAM Tree: Frequent Subgraphs

= 2/3


How to enumerate nodes in a cam tree

How to Enumerate Nodes in a CAM Tree?

  • Two operations to explore CAM tree:

    • CAM-Join

    • CAM-Extension

  • Augmenting CAM tree with Suboptimal CAMs

  • Objectives:

    • no false dismissal

    • no redundancy

  • Plus: We want to this efficiently!


Suboptimal tree

a

b

y

b

y

c

y

x

b

j

e

e

j

j

e

e

a

a

b

b

b

b

y

b

x

b

y

b

x

b

x

b

x

b

0

y

c

0

y

d

0

y

d

y

0

c

y

0

d

0

y

c

j

j

e

e

j

j

a

a

a

a

b

b

y

b

y

b

y

b

y

b

x

b

x

b

y

x

b

y

x

b

y

0

c

y

0

d

y

x

b

y

x

b

p2

p5

y

0

0

y

c

0

0

y

d

0

y

0

d

0

y

0

c

0

y

0

c

0

y

0

d

c

b

y

j

j

p1

a

a

x

a

y

y

b

y

b

y

y

x

b

y

x

b

d

b

0

y

0

c

0

0

y

d

p4

p3

0

0

y

0

d

0

0

y

0

c

(P)

Suboptimal Tree

We define a Suboptimal CAM as a matrix that its MP submatrix is a CAM.

d

b

c

a

b

b

a

y

d

x

b

y

b


Summary

Summary

  • Theorem:

    For a graph G, let CK-1 (Ck) be set of the suboptimal CAMs of all size-(K-1) (K) subgraphs of G (K ≥ 2). Every member of set CK can be enumerated unambiguously either by joining two members of set CK-1 or by extending a member in CK-1.


Experimental study

Experimental Study

  • Predictive Toxicology Evaluation Competition (PTE)

    • Contains: 337 compounds

    • Each graph contains 27 nodes and 27 edges on average

  • NIH DTP Anti-Viral Screen Test (DTP CA/CM)

    • Chemicals are classified to be Confirmed Active (CA), Confirmed Moderate Active (CM) and Confirmed Inactive (CI).

    • We formed a dataset contains CA (423) and CM (1083).

    • Each graph contains 25 nodes and 27 edges on average


Performance pte

Performance (PTE)

Support Threshold (%)

Support Threshold (%)


Performance dtp cacm

Performance (DTP CACM)

Support Threshold (%)

Support Threshold (%)


  • Login