Spin mining maximal frequent subgraphs from graph databases
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases PowerPoint PPT Presentation


  • 164 Views
  • Uploaded on
  • Presentation posted in: General

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases. Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004. Introduction. Graphs model a relations among data Inter-disciplinary research Huge number of recurring patterns To mining only maximal frequent subgraphs.

Download Presentation

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Spin mining maximal frequent subgraphs from graph databases

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases

Jun Huan, Wei Wang, Jan Prins, Jiong Yang

KDD 2004


Introduction

Introduction

  • Graphs model a relations among data

    • Inter-disciplinary research

  • Huge number of recurring patterns

  • To mining only maximal frequent subgraphs.

    • None of its super graphs are frequent


Advantages

Advantages

  • Reducing the total number of mined subgraphs

    • Saving space and analysis effort

  • Reducing mining time

  • Non-maximal frequent subgraph can be reconstructed.

  • Maximal frequent subgraphs are of most interest in some appliations.


Algorithm

Algorithm

  • Mining all frequent trees from a general graph database.

    • Tree normalization is simpler than graph.

    • In certain applications, most of the frequent subgraphs are really trees.

    • Use current subgraph mining algorithm

    • Mining subtrees from a forest


Algorithm1

Algorithm

  • Reconstruct all maximal subgraphs from the mined trees.

    • For each frequent tree T, find all frequent subgraphs whose canonical spanning tree are isomorphic to T

    • Enumerate the equvalence class of a tree T

    • Maximal subgraph mining


Tree based equivalence classes

Tree-based Equivalence Classes

  • A subtree T is a spanning tree of G if T contains all nodes in G.

    • Maximal one: canonical spanning tree

  • Group all frequent subgraphs in to equivalence classes based on spanning trees.


Spanning tree

Spanning tree


Tree based equivalence classes1

Tree-based Equivalence Classes

back


12 singletons group

b

y

x

a

b

b

a

b

a

b

b

b

b

b

a

a

y

y

x

y

y

y

y

x

x

x

y

x

a

a

a

a

a

a

a

a

a

a

a

a

x

x

x

y

x

y

y

x

a

a

a

a

a

a

a

a

x

y

a

a

12 singletons group


Enumerating graphs from trees

Enumerating Graphs from Trees

  • G C :{e1,e2,…,en}

    • If frequent -> edge C (candidate set)

  • Search space of G: G:C ={G+y|y 2C}

GO


Optimizations

Optimizations

  • Removing a set of frequent subgraphs that can not be maximal from a search space

  • Locally maximal:frequent subgraph G is maximal in its equivalence class

  • Globally maximal:maximal frequent in a graph database

  • Avoid enumerating subgraphs which are notlocally maximal.


Bottom up pruning

Bottom-up Pruning

  • G’ = G C

    • G’ is frequent : each graph in search space is a subgraph of G’ and not maximal


Tail shrink

Tail Shrink

  • Embedding of G in G’ is a subgraph isomorphism f from G to G’

    • Two embeddings of L in P

l1->P1, l2->P2, l3->P3, l4->P4

l1->P1, l2->P3 ,l3->P2 ,l4->P4

go


Tail shrink1

Tail Shrink

  • candidate edge (i, j, el) is associative to a graph G

    • It appears in every embedding of G in a graph databases

  • If a tree T contains a set of associative edges, any maximal frequent graph G, a superset of T, must contains all associative edges.


Tail shrink2

Tail Shrink

  • Remove associative edges from candidate sets and augment them to T without missing any maximal ones

    • Reducing the search space

    • Prune the entire equivalences class in certain cases

  • A set of associative edges C of a tree T is lethal

    • G’ = T C has a canonical spanning treedifferent from that of T

go


External edge pruning

External-Edge Pruning

  • Remove one equivalence class without any knowledge about its candidate edges

  • External-edge for a graph G: it connects a node in G and a node not in G

  • (i, el, vl) is associative to a graph G

    • Every embedding f of G in a graph G’, G’ has a node v with the label vl

    • v connects to the node f(i) with an edge label el in G’

    • Not exist node j V[G] such that v = f(j)


Associative external edges

Associative external edges


Experiments

Experiments

  • 2.8GHz Pentium Xeon,

  • 512KB L2 cache,2GB main memory

  • Red Hat Linux 7.3

  • C++ Programming language


Synthetic dataset

Synthetic Dataset

D10KT30L200I11V4E4


Dtp ca data set

DTP CA data set


Dtp cm data set

DTP CM data set


  • Login