# New Algorithms for Enumerating All Maximal Cliques - PowerPoint PPT Presentation

1 / 22

New Algorithms for Enumerating All Maximal Cliques. Kazuhisa Makino Takeaki Uno Osaka University National Institute of JAPAN Informatics, JAPAN 9/Jul/2004 SWAT 2004. Background.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

New Algorithms for Enumerating All Maximal Cliques

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## New Algorithms for Enumerating All Maximal Cliques

Kazuhisa Makino Takeaki Uno

Osaka University National Institute of

JAPAN Informatics, JAPAN

9/Jul/2004 SWAT 2004

## Background

Recently, Enumeration algorithms are interesting

・There are still many unsolved nice problems

(unlike to ordinal discrete algorithms)

・Recent increase of computer power makes

many enumeration problems practically solvable

 many applications have been appearing,

such as, genome, data mining, clustering, so on

・Some (theoretical) algorithms use enumeration as subroutines

(recognition of perfect graph)

## Background (cont.)

・My institute has 100 researchers of informatics

・ At least 5 researchers (independently) use implementations of enumeration algorithms

・Suppose that there are 100,000 researchers of informatics

in the world

5000 researchers use enumeration algorithms ?????

## Problems and Results

Problem1 : for a given graph G=(V, E),

enumerate all maximal cliques in G

Problem2 : for a given bipartite graph G=(V1∪V2, E),

enumerate all maximal bipartite cliques in G

( Problem2 is a special case of Problem1 )

・ We propose algorithms for solving these problems,

reduce the time complexity in dense cases and sparse cases.

・ Computational experiments for random graphs and real-world data

## Difficulty

・ Consider branch-and-bound type enumeration:

divide maximal cliques into two groups

maximal cliques includingv / not includingv

・ If a group includes no maximal clique,  cut off the branch

 Finding a maximal clique not including given vertices of S

is NP-Complete

 Can not cut off subproblems(branches)

including no maximal clique

v1∈K

v1∈K

v2∈K

v2∈K

## Existing Studies and Ours

O(|V||E|): Tsukiyama, Ide, Ariyoshi & Shirakawa,

O(|V||E|),lexicographic order: Johnson, Yanakakis & Papadimitriou

O(a(G)|E|): Chiba & Nishizeki

( a(G): arboricity of Gwith m/(n-1)≦a(G) ≦m1/2 )

・ many heuristic algorithms in data mining, for bipartite case

Ours:

O(|V|2.376) (dense case)

O(Δ4) (sparse case)

O((Δ*)4 + θ3 ) (θ vertices have degree >Δ* )

O(Δ3) (bipartite case)

O(Δ2) (bipartite case with using much memory)

9

4

10

7

3

6

8

### Enumeration of Maximal Cliques

・Improved version of algorithm of Tsukiyama et. al.

Idea: Construct a route on all maximal cliques to be traversed

・ For a maximal clique K of G = ( V, E ):

C (K) : lexicographically maximum maximal clique including K

K≦i: vertices of K with indices ≦i

i(K) :minimum index s.t. C(K≦i) =C(K≦i+1)

parent of a maximal clique K : C(K≦i(K)-1)

・parent is lexicographically larger than K

Lexicographically

larger

9

4

1

11

7

1,2,3>1,2,4

3

10

1,3,6>1,4,5

2

K

6

8

i(K)

5

### Graph Representation of Relation

・Parent-child relation is acyclic

graph representation forms atree (enumeration tree)

Visit all maximal cliques by depth-first search

・need to find children of a maximal clique

10

9

4

K[8]

8

### Child of Maximal Clique

K[i] = C ( K≦i∩Γ(vi)∪ {vi} )

・ H is a child of K only if H = K[i] for some i>i(K)

(H is a child of K if the parent of K[i] is K )

・ i(K[i]) = i

・construct K[i] in O(|E|) time

・construct parent in O(|E|) time

( O(Δ2 ) time)

・for i=i(K)+1,…,|V| in O(|V||E|) time

enumerate O(|V||E|) time

per maximal clique

K,i(K)=6

9

4

1

11

7

3

10

2

6

8

5

5

1

4

K≦5∪

### Characterization of Child

The parent of K[i]=K⇔

(1) no vj , j<i is adjacent to all vertices in K≦i∩Γ(vi) ∪ {vi}

(2) no vj , j<i is adjacent to all vertices in K≦i∩Γ(vi) ∪ K≦j

(1) is not satisfied ⇔K[i] and parent of K[i] includes vj∈K

(2) is not satisfied ⇔ parent of K[i] includes vj∈K

K = {3,4,7,9}

K[10] = {3,7,10}

K≦5= {3,4}

K ≦7∩Γ(v10) = {3,7}

7

4

9

3

10

K ≦10∩Γ(v10)

∪ {v10}

### Use of Matrix Multiplication

・ Check the conditions (1) and (2) by matrix multiplication

(1) no vj , j<i is adjacent to all vertices in K ≦i∩Γ(vi) ∪ {vi}

ith row of left ⇒K≦i∩Γ(vi)∪{vi}

jth column of right ⇒Γ(vj)

ij cell of product ⇒ |K≦i∩Γ(vi)∪{vi} ∩Γ(vj) |

= |K≦i∩Γ(vi)∪{vi}| ?

Γ(vj) ∩

K ≦i∩Γ(vi) ∪ {vi}

K≦i∩Γ(vi)∪{vi}

Γ(vj)

Condition (2) can be checked in the same way

Checked in O(|V|2.368 ) time ⇒ time complexity is O(|V|2.368 ) for each

O((Δ*)4 + |Θ|3 ) if partially dense

Δ*: max. degree in V＼Θ

### Sparse Cases

・If vi is adjacent to no vertex in K

K[i] = C ( K≦i∩Γ(vi)∪ {vi} ) = C ({vi})

parent of K[i] = C ( C ({vi}) ≦i )

If C ({vi}) ≦i＝φ,parent of K[i] is K0

If C ({vi}) ≦i≠φ,(1) is not satisfied

If K≠ K0,K[i] is not a child of K

・ Since |K|≦Δ+1 , at most Δ(Δ+1) vertices are adjacent to K

・ Each K[i] takes O(Δ2) time to construct the parent

Δ: max. degree

O(Δ4 ) per maximal clique

### Bipartite Clique

・ Enumerate maximal bipartite cliques in G =(V1 ∪V2 ,E )

( = maximal cliques in G’ =(V1 ∪V2 , E ∪V1 ×V1∪V2×V2))

 enumerated in O(|V|2.368 ) time for each

・ But a sparse bipartite graph will be dense

 need some improvements for sparse cases

V1

V2

K[i]

vi

### Fast Construction of K[i]

・ For any maximal bipartite clique K

K∩V2= ∩v∈K∩V1Γ(v)

K∩V1= ∩v∈K∩V2Γ(v)

・K[i]∩V1for all i are computed in O(Δ2) time

・K[i]for all i are computed in O(Δ3) time

K[v1]

K[v6]

V1

1

2

3

4

V2

K[i]

vi

### Checking the Parent

・・・

V1

1

2

3

|V1|-1

|V1|

・ Put small indices to V1 , large indices to V2

K[i] is a child of K ⇔ K[i]≦i = K≦i

checked in O(Δ)time

V2

・・・

|V1|+1

|V1|+2

V1

V2

Enumerated in O(Δ3) time for each

O(Δ2) by using memory

## Computational Experiments

・ for graphs randomly generated

・ vertex viis connected to vertices from i-rto i+rwith probability 1/2

・ Faster than Tsukiyama’s algorithm

・ Computation time is linear in maximum degree

## Benchmark Problems

・ Problem of finding frequent closed item sets from database

 equivalent to maximal bipartite clique enumeration

・ Used on KDDcup (data mining algorithm competition )

BMS-WebView1　 (from Web-log data)

|V|=60,000, ave. degree2.5

BMS-WebView2　(from Web-log data)

|V|=80,000, ave. degree5

BMS-POS(from POS data)

|V|=510,000, ave. degree 6

IBM-Artificial　 (artificial data)

|V|= 100,000, ave.degree10

## Conclusion and Future Work

・ Proposed fast algorithms for enumerating

maximal cliques: O(|V|2.376), O(Δ4 ), O((Δ*)4 + θ3 )

maximal bipartite cliques: O(|V|2.376), O(Δ3 ), O(Δ2)

・ Examined benchmark problems of data mining,

and showed that our algorithm performs well.

Future work:

・ Can we improve more? What is the difficulty ?

・ Can we enumerate other maximal (minimal) graph objects ?

・ Can we apply matrix multiplication to other enumeration problems ?

・ What can be enumerated efficiently in practice ?

## Frequent Sets

customer1

customer2

customer3

customer4

beer

nappy

milk

Input graph:

An item and a customer is connected

iff the customer purchased the item

In a maximal bipartite clique:

Customers: have similar favorites

Items: frequently purchased together

[Agrawal et al. 96, Zaki et al. 02, Pei 00, Han 00, … ]

### Few Large Degree Vertices

・Very few vertices (denoted by Θ) have large degrees

・Divide the maximal cliques into two groups:

(a) cliques not included in Θ

(b) cliques included in Θ

・(a) can be enumerated in O(Δ’4) time

・ Maximal clique K in the induced graph by Θ is

a maximal clique of G⇔K is not included in any of (a)

 O(|Θ|3) timefor each

small degree < Δ’

large

degree

O(Δ’4 + |Θ|3 ) per maximal clique

### Avoid Duplications by Using Memory

・We can avoid duplications by storing all maximal bipartite cliques

・ From K∩V1=Γ(K∩V2) ,we store all K∩V1

1. Get a K from memory (which is un-operated)

2. generate all K[i]∩V1

3. Store each K[i]∩V1 if it is not in memory

4. Go to 1 if a maximal clique is un-operated

Enumerated in O(Δ2) time for each