Graph similarity
Download
1 / 57

Graph Similarity - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

北京大学计算机科学技术研究所 Institute of Computer Science and Technology of Peking University. Graph Similarity. Instructor: Lei Zou. Outline. Maximal Common Subgraph Minimal Edit Distance Graph Similarity Search. Outline. Maximal Common Subgraph Minimal Edit Distance Graph Similarity Search.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Graph Similarity' - liv


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Graph similarity

北京大学计算机科学技术研究所

Institute of Computer Science and Technology of Peking University

Graph Similarity

Instructor: Lei Zou


Outline
Outline

  • Maximal Common Subgraph

  • Minimal Edit Distance

  • Graph Similarity Search


Outline1
Outline

  • Maximal Common Subgraph

  • Minimal Edit Distance

  • Graph Similarity Search


Maximal common subgraph
Maximal Common Subgraph

Def. 1 (Induced Subgraph). An induced subgraph is a set S of vertices of a graph G and those edges of G with both endpoints in S.

Def. 2 (Maximal Common Induced Subgraph ) A graph G12 is a common induced subgraph of graphs G1 and G2 if G12 is isomorphic to induced subgraphs of G1 and G2, respectively. A maximum common induced subgraph (MCIS) consists of a graph G12 with the largest number of vertices meeting the aforementioned property


A

A

A

A

B

C

B

C

B

C

B

C

D

D

D

D

MCIS

MCIS


Def. 3 (Maximum Common Edge Subgraph) An MCES is a subgraph consisting of the largest number of edges common to both G1 and G2

A

A

B

C

B

C

D

D


Finding maximal common subgraph
Finding Maximal Common Subgraph

  • Maximum clique-based algorithm(for MCIS)

Def. 4 The modular product of two graphs G1 and G2 is defined on the vertex set V (G1) × V (G2) with two

vertices (ui vi ) and (uj vj ) being adjacent whenever

1. ui and vi have the same vertex label, so do uj and vj

2. (ui uj ) ∈ E(G1) and (vi vj ) ∈ E(G2), or

3. (ui uj) E(G1) and (vi vj ) E(G2).


Maximal Clique

v1

u1

A

A

(u1, v1)

(u3, v3)

u2

u3

v3

v2

B

C

B

C

(u2, v2)

(u4, v4)

D

v4

D

u4

modular product

(association graph)

A Maximal Clique in the modular product corresponds to a maximal common induced subgraph


  • Def. 5 A clique in a graph G is a subset of vertices in the graph such that each pair of vertices in the subset is connected by an edge in the graph G.

    A maximal clique (极大团) is a clique that cannot be extended by including one more adjacent vertex, that is, a clique which does not exist exclusively within the vertex set of a larger clique.

    A maximum clique ( 最大团) is a clique of the largest possible size in a given graph. The clique number ω(G) of a graph G is the number of vertices in a maximum clique in G.


5

Maximal clique:

(1,2,3)

(1,3,4,5)

A maximum clique:

(1,3,4,5)

1

4

2

3


Finding maximal clique
Finding Maximal Clique

  • Bron–Kerbosch algorithm(@1973)

Basic Algorithm:

R=null; and P=V(G); // V(G) denotes all vertices in G

FindingMaximalClque(R,P):

    if P is empty:

        report R as a maximal clique

    for each vertex v in P:

        FindingMaximalClque (R ⋃ {v}, P ⋂ N(v))

// N(v) denotes all v’s neighbor vertices.    

Problem: It may generate duplicate answers


1

2

3


R=null; and P=V(G); // V(G) denotes all vertices in G

FindingMaximalClque(R,P):

    if P is empty:

        report R as a maximal clique

    for each vertex v in P:

         FindingMaximalClque (R ⋃ {v}, P ⋂ N(v))

// N(v) denotes all v’s neighbor vertices. 

P=P\ {v};

Problem: It may generate some un-maximal clique.


1

2

3


Bron–Kerbosch algorithm

R=null; and P=V(G); // V(G) denotes all vertices in G

FindingMaximalClque(R,P, S):

    if P and S are both empty:

        report R as a maximal clique

    for each vertex v in P:

         FindingMaximalClque (R ⋃ {v}, P ⋂ N(v), X ⋂ N(v))

// N(v) denotes all v’s neighbor vertices. 

P=P\ {v};

X= X ⋃ {v}; // why ???


  • Theorem. Given a vertex u, consider that all the maximal cliques containing Q ∪ {u} have been generated. Then, every new maximal clique containing Q, but not Q ∪ {u}, must contain at least one vertex q that is not adjacent to u.


  • Backtracking algorithms (e.g., McGregor algorithm) (for both MCIS and MCES)

    It can be suitably described through a State Space Representation . Each state s represents a common subgraph of the two graphs under construction. This common subgraph is part of the MCS to be eventually formed.


Outline2
Outline

  • Maximal Common Subgraph

  • Minimal Edit Distance

  • Graph Similarity Search


Minimal edit distance
Minimal Edit Distance

  • Six edit operations

    • Insert an isolated vertex

    • Delete an isolated vertex

    • Change the label of a vertex

    • Insert an edge between two disconnected vertices

    • Delete an edge from two connected vertices

    • Change the label of an edge

  • Graph Edit Distance:

    • The minimum operations needed to transform a graph to another one (NP-Hard)


Minimal edit distance1
Minimal Edit Distance

A

A

A

B

D

B

D

B

C

G1

A

A

B

C

B

C

MED(G1,G2)=4

D

D

G2


Minimal edit distance2
Minimal Edit Distance

Given two graphs G1 and G2, assume that they have the same number of vertices. Define a function f: V(G1)  V(G2). The distance under this function is:


Minimal edit distance3
Minimal Edit Distance

The distance between G1 and G2 is defined as

We can prove that

If G1 and G2 have different vertex numbers, assume that |V(G1)| < |V(G2)|, we introduce |V(G2)|-|V(G1)| pseudo vertices, the following equation still holds.


Minimal edit distance4
Minimal Edit Distance

A

A

B

D

G1

B

C

D

G2


Minimal edit distance5
Minimal Edit Distance

  • Exact Algorithm (A*-algorithm )

    What’s A*-algorithm:

    A* uses a best-first search and finds a least-cost path from a given initial node to one goal node (out of one or more possible goals). As A* traverses the graph, it follows a path of the lowest known heuristic cost, keeping a sorted priority queue of alternate path segments along the way.

    where g(x) denotes the cost from the starting node to the current node; h(x) denotes the "heuristic estimate“ (lower bound) of the distance from  to the goal


Minimal edit distance6
Minimal Edit Distance

Given two graphs G1 and G2 have the same number m of vertices, let us consider the following process.

Let N1 and N2 denote the vertices in G1 and G2 that have been matched.

N1=(v1,v2,…,vn);

N2=(u1,u2,…,un);

Let M1 and M2 denote the vertices in G1 and G2 that have not been matched.

M1=(vn+1,vn+2,…,vm);

M2=(un+1,un+2,…,um).



Outline3
Outline

  • Maximal Common Subgraph

  • Minimal Edit Distance

  • Graph Similarity Search


Comparing stars
Comparing Stars

Comparing Stars: On Approximating Graph Edit Distance

Zhiping Zeng, Anthony K.H. Tung, Jianyong Wang, Jianhua Feng, Lizhu Zhou

@VLDB09


Comparing stars1
Comparing Stars

  • 问题定义

    Given a graph database D consisting of n graphs

    • Approximate full graph search

      • Find all the graphs in D s.t. { gi | GED(q,gi) ≤𝜏 }

    • Approximate subgraph search

      • Find all the graphs in D s.t. { gi |GED(q,r) ≤𝜏 and r gi }


Comparing stars2
Comparing Stars

  • Main Idea:

    G star structures

  • Star Structure:

    三元组(r,L,l): r: root vertex

    L: the set of leaves

    l: labeling function


Comparing stars3
Comparing Stars

T

  • Star edit distance


Comparing stars4
Comparing Stars

  • Given two multisets of star structures S1 and S2, P: S1 S2 , is a bijection.

  • Assignment Problem




Comparing stars7
Comparing Stars

What’s the relationship between GED(g1,g2) and

?


Comparing stars8
Comparing Stars

A distance function f is metricif and only if the following conditions hold:


Comparing stars9
Comparing Stars

We can prove that

  • Graph edit distance is metric. (assume that all edit operation cost is non-negative)

  • Mapping distance is also metric.


Comparing stars10
Comparing Stars

  • Given two graphs g1 and g2, Let P=(p1, p2, . . . , pk) be an alignment transforming g1 to g2. Accordingly, there is a sequence of graphs

    g1=h0h1. . .hk=g2, where hi−1hi indicates that hi is the derived graph by performing pi over hi−1.

    As is metric, thus, we have the following equation:


Comparing stars11
Comparing Stars

What’s the relationship between one operation pi and ?

  • Edge Insertion/Deletion

    One edge insertion/deletion at most affect two stars. Each star cost is at most 2.

    Thus, due to one edge insertion/deletion.


Comparing stars12
Comparing Stars

What’s the relationship between one operation pi and ?

2. Vertex Insertion/Deletion

One vertex insertion/deletion at most affect one star. Each star cost is at most 1.

Thus, due to one vertex insertion/deletion.


Comparing stars13
Comparing Stars

What’s the relationship between one operation pi and ?

3. Vertex Relabeling

One vertex relabeling v0 at most affects deg(v0)+1’s stars.


Comparing stars14
Comparing Stars

Lower Bound


Comparing stars15
Comparing Stars

Upper Bound:

Based on the bipartite graph matching, we can define a upper bound for the edit distance.



Comparing stars17
Comparing Stars

Experiment datasets

  • Real dataset

    • AIDS antivirus screen component. 42,687 chemical components

  • Synthetic dataset

    • 1000 graphs, average size:10


Gsimjoin
GSimJoin

Efficient graph similarity joins with edit distance constraints

Xiang Zhao, Chun Xiao, Xuemin Lin, and Wei Wang.

@ICDE12


Gsimjoin1
GSimJoin

  • 问题定义

    Given two sets of graphs 𝑅 and 𝑆, a graph similarity join with edit distance threshold 𝜏 returns pairs of graphs from each set, such that their graph edit distance is no larger than𝜏, i.e.,

    { ⟨𝑟, 𝑠⟩ ∣ 𝑔𝑒𝑑(𝑟, 𝑠) ≤ 𝜏, 𝑟 ∈ 𝑅, 𝑠 ∈ 𝑆 }.

  • this paper will focus on the self-join case

    {⟨𝑟𝑖, 𝑟𝑗⟩ ∣ 𝑔𝑒𝑑(𝑟𝑖, 𝑟𝑗) ≤ 𝜏 ∧ 𝑟𝑖.𝑖𝑑 <𝑟𝑗.𝑖𝑑,𝑟𝑖∈ 𝑅, 𝑟𝑗∈ 𝑅}.


Gsimjoin2
GSimJoin

  • Definition (path-based 𝑞-gram): A path-based 𝑞-gram in a graph 𝑟 is a simple path of length 𝑞.


Gsimjoin3
GSimJoin

Let Qr denote the multiset of q-gram in a graph r and Qru denote the multiset of q-grams that contain the vertex u.


Gsimjoin4
GSimJoin

  • Count Filtering: Consider two graphs 𝑟 and 𝑠.

    If 𝑔𝑒𝑑(𝑟, 𝑠) ≤ 𝜏 , 𝑟 and 𝑠 must share at least

    𝐿𝐵𝑝𝑎𝑡ℎ common 𝑞-grams.

    对于前面的两个例子𝜏 =1,

    当q=1, LB = max(4-3,5-3)=2

    当q=2, LB= max(5-5, 7-6) =1


Gsimjoin5
GSimJoin

  • Prefix Filtering

    the 𝑝-prefix be their first 𝑝 elements. If

    |𝑄𝑟 ∩𝑄𝑠| ≥𝛼, then the (|𝑄𝑟|−𝛼+1)-prefix of 𝑄𝑟 and the (|𝑄𝑠|−𝛼+1)-prefix of 𝑄𝑠 must have at least one common 𝑞-gram.


Gsimjoin6
GSimJoin

  • Minimum Edit Filtering

  • 𝜏 =1,q=1 , LB=2 (need at least 2 matches) < 3

    consider the two mismatching 𝑞-grams in 𝑠: C-O and C-N, it can be seen that they are disjoint


Gsimjoin7
GSimJoin

  • To handle the general case where 𝑞-grams may overlap

  • minimum graph edit operation problem:

    Given a multiset of 𝑞-grams 𝑄, find the minimum number of graph edit operations that can affect all the 𝑞-grams in 𝑄.

    NP-Hard (greedy algorithm)


Gsimjoin8
GSimJoin

  • Label Filtering

  • Connect the mismatching components, and compute the minimum edit operations for them.


References
References

  • John W. Raymond and Peter Willett, Maximum common subgraph isomorphism algorithms for the matching of chemical structures, Journal of Computer-Aided Molecular Design, 16: 521–533, 2002.

  • D. Conte, C. Guidobaldi, and C. Sansone, A Comparison of Three Maximum Common Subgraph Algorithms on a Large Database of Labeled Graphs, GbRPR'03.

  • Etsuji Tomita, Akira Tanaka, Haruhisa Takahashi: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1): 28-42 (2006)


References1
References

  • Andrew K. C. Wong, Manlai You, S. C. Chan, An Algorithm for Graph Optimal Monomorphism, IEEE TRANSAC‘TIONS ON SYSTFMS, MAN, ANI) C’YBEKNETIC’S, VOI.. 20. NO. 3. MAY/JUNE 1990

  • Xiang Zhao, Chuan Xiao, Xuemin Lin, Wei Wang: Efficient Graph Similarity Joins with Edit Distance Constraints. ICDE 2012: 834-845

  • Zhiping Zeng, Anthony K. H. Tung, Jianyong Wang, Jianhua Feng, Lizhu Zhou: Comparing Stars: On Approximating Graph Edit Distance. PVLDB 2(1): 25-36 (2009)


ad