Loading in 5 sec....

SPIN: Mining Maximal Frequent Subgraphs from Graph DatabasesPowerPoint Presentation

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases

- By
**kathy** - Follow User

- 212 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' SPIN: Mining Maximal Frequent Subgraphs from Graph Databases' - kathy

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### SPIN: Mining Maximal Frequent Subgraphs from Graph Databases

Jun Huan, Wei Wang, Jan Prins, Jiong Yang

KDD 2004

Introduction

- Graphs model a relations among data
- Inter-disciplinary research

- Huge number of recurring patterns
- To mining only maximal frequent subgraphs.
- None of its super graphs are frequent

Advantages

- Reducing the total number of mined subgraphs
- Saving space and analysis effort

- Reducing mining time
- Non-maximal frequent subgraph can be reconstructed.
- Maximal frequent subgraphs are of most interest in some appliations.

Algorithm

- Mining all frequent trees from a general graph database.
- Tree normalization is simpler than graph.
- In certain applications, most of the frequent subgraphs are really trees.
- Use current subgraph mining algorithm
- Mining subtrees from a forest

Algorithm

- Reconstruct all maximal subgraphs from the mined trees.
- For each frequent tree T, find all frequent subgraphs whose canonical spanning tree are isomorphic to T
- Enumerate the equvalence class of a tree T
- Maximal subgraph mining

Tree-based Equivalence Classes

- A subtree T is a spanning tree of G if T contains all nodes in G.
- Maximal one: canonical spanning tree

- Group all frequent subgraphs in to equivalence classes based on spanning trees.

y

x

a

b

b

a

b

a

b

b

b

b

b

a

a

y

y

x

y

y

y

y

x

x

x

y

x

a

a

a

a

a

a

a

a

a

a

a

a

x

x

x

y

x

y

y

x

a

a

a

a

a

a

a

a

x

y

a

a

12 singletons groupEnumerating Graphs from Trees

- G C :{e1,e2,…,en}
- If frequent -> edge C (candidate set)

- Search space of G： G:C ={G+y|y 2C}

GO

Optimizations

- Removing a set of frequent subgraphs that can not be maximal from a search space
- Locally maximal：frequent subgraph G is maximal in its equivalence class
- Globally maximal：maximal frequent in a graph database
- Avoid enumerating subgraphs which are notlocally maximal.

Bottom-up Pruning

- G’ = G C
- G’ is frequent : each graph in search space is a subgraph of G’ and not maximal

Tail Shrink

- Embedding of G in G’ is a subgraph isomorphism f from G to G’
- Two embeddings of L in P

l1->P1, l2->P2, l3->P3, l4->P4

l1->P1, l2->P3 ,l3->P2 ,l4->P4

go

Tail Shrink

- candidate edge (i, j, el) is associative to a graph G
- It appears in every embedding of G in a graph databases

- If a tree T contains a set of associative edges, any maximal frequent graph G, a superset of T, must contains all associative edges.

Tail Shrink

- Remove associative edges from candidate sets and augment them to T without missing any maximal ones
- Reducing the search space
- Prune the entire equivalences class in certain cases

- A set of associative edges C of a tree T is lethal
- G’ = T C has a canonical spanning treedifferent from that of T

go

External-Edge Pruning

- Remove one equivalence class without any knowledge about its candidate edges
- External-edge for a graph G: it connects a node in G and a node not in G
- (i, el, vl) is associative to a graph G
- Every embedding f of G in a graph G’, G’ has a node v with the label vl
- v connects to the node f(i) with an edge label el in G’
- Not exist node j V[G] such that v = f(j)

Experiments

- 2.8GHz Pentium Xeon,
- 512KB L2 cache,2GB main memory
- Red Hat Linux 7.3
- C++ Programming language

Synthetic Dataset

D10KT30L200I11V4E4

Download Presentation

Connecting to Server..