graph similarity
Download
Skip this Video
Download Presentation
Graph Similarity

Loading in 2 Seconds...

play fullscreen
1 / 57

Graph Similarity - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

北京大学计算机科学技术研究所 Institute of Computer Science and Technology of Peking University. Graph Similarity. Instructor: Lei Zou. Outline. Maximal Common Subgraph Minimal Edit Distance Graph Similarity Search. Outline. Maximal Common Subgraph Minimal Edit Distance Graph Similarity Search.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Graph Similarity' - liv


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
graph similarity

北京大学计算机科学技术研究所

Institute of Computer Science and Technology of Peking University

Graph Similarity

Instructor: Lei Zou

outline
Outline
  • Maximal Common Subgraph
  • Minimal Edit Distance
  • Graph Similarity Search
outline1
Outline
  • Maximal Common Subgraph
  • Minimal Edit Distance
  • Graph Similarity Search
maximal common subgraph
Maximal Common Subgraph

Def. 1 (Induced Subgraph). An induced subgraph is a set S of vertices of a graph G and those edges of G with both endpoints in S.

Def. 2 (Maximal Common Induced Subgraph ) A graph G12 is a common induced subgraph of graphs G1 and G2 if G12 is isomorphic to induced subgraphs of G1 and G2, respectively. A maximum common induced subgraph (MCIS) consists of a graph G12 with the largest number of vertices meeting the aforementioned property

slide5

A

A

A

A

B

C

B

C

B

C

B

C

D

D

D

D

MCIS

MCIS

slide6

Def. 3 (Maximum Common Edge Subgraph) An MCES is a subgraph consisting of the largest number of edges common to both G1 and G2

A

A

B

C

B

C

D

D

finding maximal common subgraph
Finding Maximal Common Subgraph
  • Maximum clique-based algorithm(for MCIS)

Def. 4 The modular product of two graphs G1 and G2 is defined on the vertex set V (G1) × V (G2) with two

vertices (ui vi ) and (uj vj ) being adjacent whenever

1. ui and vi have the same vertex label, so do uj and vj

2. (ui uj ) ∈ E(G1) and (vi vj ) ∈ E(G2), or

3. (ui uj) E(G1) and (vi vj ) E(G2).

slide8

Maximal Clique

v1

u1

A

A

(u1, v1)

(u3, v3)

u2

u3

v3

v2

B

C

B

C

(u2, v2)

(u4, v4)

D

v4

D

u4

modular product

(association graph)

A Maximal Clique in the modular product corresponds to a maximal common induced subgraph

slide9

Def. 5 A clique in a graph G is a subset of vertices in the graph such that each pair of vertices in the subset is connected by an edge in the graph G.

A maximal clique (极大团) is a clique that cannot be extended by including one more adjacent vertex, that is, a clique which does not exist exclusively within the vertex set of a larger clique.

A maximum clique ( 最大团) is a clique of the largest possible size in a given graph. The clique number ω(G) of a graph G is the number of vertices in a maximum clique in G.

slide10

5

Maximal clique:

(1,2,3)

(1,3,4,5)

A maximum clique:

(1,3,4,5)

1

4

2

3

finding maximal clique
Finding Maximal Clique
  • Bron–Kerbosch algorithm(@1973)

Basic Algorithm:

R=null; and P=V(G); // V(G) denotes all vertices in G

FindingMaximalClque(R,P):

    if P is empty:

        report R as a maximal clique

    for each vertex v in P:

        FindingMaximalClque (R ⋃ {v}, P ⋂ N(v))

// N(v) denotes all v’s neighbor vertices.    

Problem: It may generate duplicate answers

slide12

 (1,2 )  (1,2,3)

  • (1)  (1,3 )  (1,3,2)

1

2

3

slide13

R=null; and P=V(G); // V(G) denotes all vertices in G

FindingMaximalClque(R,P):

    if P is empty:

        report R as a maximal clique

    for each vertex v in P:

         FindingMaximalClque (R ⋃ {v}, P ⋂ N(v))

// N(v) denotes all v’s neighbor vertices. 

P=P\ {v};

Problem: It may generate some un-maximal clique.

slide14

 (1,2 )  (1,2,3)

  • (1)  (1,3 )

1

2

3

slide15

Bron–Kerbosch algorithm

R=null; and P=V(G); // V(G) denotes all vertices in G

FindingMaximalClque(R,P, S):

    if P and S are both empty:

        report R as a maximal clique

    for each vertex v in P:

         FindingMaximalClque (R ⋃ {v}, P ⋂ N(v), X ⋂ N(v))

// N(v) denotes all v’s neighbor vertices. 

P=P\ {v};

X= X ⋃ {v}; // why ???

slide16

Theorem. Given a vertex u, consider that all the maximal cliques containing Q ∪ {u} have been generated. Then, every new maximal clique containing Q, but not Q ∪ {u}, must contain at least one vertex q that is not adjacent to u.

slide17

Backtracking algorithms (e.g., McGregor algorithm) (for both MCIS and MCES)

It can be suitably described through a State Space Representation . Each state s represents a common subgraph of the two graphs under construction. This common subgraph is part of the MCS to be eventually formed.

outline2
Outline
  • Maximal Common Subgraph
  • Minimal Edit Distance
  • Graph Similarity Search
minimal edit distance
Minimal Edit Distance
  • Six edit operations
    • Insert an isolated vertex
    • Delete an isolated vertex
    • Change the label of a vertex
    • Insert an edge between two disconnected vertices
    • Delete an edge from two connected vertices
    • Change the label of an edge
  • Graph Edit Distance:
    • The minimum operations needed to transform a graph to another one (NP-Hard)
minimal edit distance1
Minimal Edit Distance

A

A

A

B

D

B

D

B

C

G1

A

A

B

C

B

C

MED(G1,G2)=4

D

D

G2

minimal edit distance2
Minimal Edit Distance

Given two graphs G1 and G2, assume that they have the same number of vertices. Define a function f: V(G1)  V(G2). The distance under this function is:

minimal edit distance3
Minimal Edit Distance

The distance between G1 and G2 is defined as

We can prove that

If G1 and G2 have different vertex numbers, assume that |V(G1)| < |V(G2)|, we introduce |V(G2)|-|V(G1)| pseudo vertices, the following equation still holds.

minimal edit distance5
Minimal Edit Distance
  • Exact Algorithm (A*-algorithm )

What’s A*-algorithm:

A* uses a best-first search and finds a least-cost path from a given initial node to one goal node (out of one or more possible goals). As A* traverses the graph, it follows a path of the lowest known heuristic cost, keeping a sorted priority queue of alternate path segments along the way.

where g(x) denotes the cost from the starting node to the current node; h(x) denotes the "heuristic estimate“ (lower bound) of the distance from  to the goal

minimal edit distance6
Minimal Edit Distance

Given two graphs G1 and G2 have the same number m of vertices, let us consider the following process.

Let N1 and N2 denote the vertices in G1 and G2 that have been matched.

N1=(v1,v2,…,vn);

N2=(u1,u2,…,un);

Let M1 and M2 denote the vertices in G1 and G2 that have not been matched.

M1=(vn+1,vn+2,…,vm);

M2=(un+1,un+2,…,um).

outline3
Outline
  • Maximal Common Subgraph
  • Minimal Edit Distance
  • Graph Similarity Search
comparing stars
Comparing Stars

Comparing Stars: On Approximating Graph Edit Distance

Zhiping Zeng, Anthony K.H. Tung, Jianyong Wang, Jianhua Feng, Lizhu Zhou

@VLDB09

comparing stars1
Comparing Stars
  • 问题定义

Given a graph database D consisting of n graphs

    • Approximate full graph search
      • Find all the graphs in D s.t. { gi | GED(q,gi) ≤𝜏 }
    • Approximate subgraph search
      • Find all the graphs in D s.t. { gi |GED(q,r) ≤𝜏 and r gi }
comparing stars2
Comparing Stars
  • Main Idea:

G star structures

  • Star Structure:

三元组(r,L,l): r: root vertex

L: the set of leaves

l: labeling function

comparing stars3
Comparing Stars

T

  • Star edit distance
comparing stars4
Comparing Stars
  • Given two multisets of star structures S1 and S2, P: S1 S2 , is a bijection.
  • Assignment Problem
comparing stars7
Comparing Stars

What’s the relationship between GED(g1,g2) and

?

comparing stars8
Comparing Stars

A distance function f is metricif and only if the following conditions hold:

comparing stars9
Comparing Stars

We can prove that

  • Graph edit distance is metric. (assume that all edit operation cost is non-negative)
  • Mapping distance is also metric.
comparing stars10
Comparing Stars
  • Given two graphs g1 and g2, Let P=(p1, p2, . . . , pk) be an alignment transforming g1 to g2. Accordingly, there is a sequence of graphs

g1=h0h1. . .hk=g2, where hi−1hi indicates that hi is the derived graph by performing pi over hi−1.

As is metric, thus, we have the following equation:

comparing stars11
Comparing Stars

What’s the relationship between one operation pi and ?

  • Edge Insertion/Deletion

One edge insertion/deletion at most affect two stars. Each star cost is at most 2.

Thus, due to one edge insertion/deletion.

comparing stars12
Comparing Stars

What’s the relationship between one operation pi and ?

2. Vertex Insertion/Deletion

One vertex insertion/deletion at most affect one star. Each star cost is at most 1.

Thus, due to one vertex insertion/deletion.

comparing stars13
Comparing Stars

What’s the relationship between one operation pi and ?

3. Vertex Relabeling

One vertex relabeling v0 at most affects deg(v0)+1’s stars.

comparing stars14
Comparing Stars

Lower Bound

comparing stars15
Comparing Stars

Upper Bound:

Based on the bipartite graph matching, we can define a upper bound for the edit distance.

comparing stars17
Comparing Stars

Experiment datasets

  • Real dataset
    • AIDS antivirus screen component. 42,687 chemical components
  • Synthetic dataset
    • 1000 graphs, average size:10
gsimjoin
GSimJoin

Efficient graph similarity joins with edit distance constraints

Xiang Zhao, Chun Xiao, Xuemin Lin, and Wei Wang.

@ICDE12

gsimjoin1
GSimJoin
  • 问题定义

Given two sets of graphs 𝑅 and 𝑆, a graph similarity join with edit distance threshold 𝜏 returns pairs of graphs from each set, such that their graph edit distance is no larger than𝜏, i.e.,

{ ⟨𝑟, 𝑠⟩ ∣ 𝑔𝑒𝑑(𝑟, 𝑠) ≤ 𝜏, 𝑟 ∈ 𝑅, 𝑠 ∈ 𝑆 }.

  • this paper will focus on the self-join case

{⟨𝑟𝑖, 𝑟𝑗⟩ ∣ 𝑔𝑒𝑑(𝑟𝑖, 𝑟𝑗) ≤ 𝜏 ∧ 𝑟𝑖.𝑖𝑑 <𝑟𝑗.𝑖𝑑,𝑟𝑖∈ 𝑅, 𝑟𝑗∈ 𝑅}.

gsimjoin2
GSimJoin
  • Definition (path-based 𝑞-gram): A path-based 𝑞-gram in a graph 𝑟 is a simple path of length 𝑞.
gsimjoin3
GSimJoin

Let Qr denote the multiset of q-gram in a graph r and Qru denote the multiset of q-grams that contain the vertex u.

gsimjoin4
GSimJoin
  • Count Filtering: Consider two graphs 𝑟 and 𝑠.

If 𝑔𝑒𝑑(𝑟, 𝑠) ≤ 𝜏 , 𝑟 and 𝑠 must share at least

𝐿𝐵𝑝𝑎𝑡ℎ common 𝑞-grams.

对于前面的两个例子𝜏 =1,

当q=1, LB = max(4-3,5-3)=2

当q=2, LB= max(5-5, 7-6) =1

gsimjoin5
GSimJoin
  • Prefix Filtering

the 𝑝-prefix be their first 𝑝 elements. If

|𝑄𝑟 ∩𝑄𝑠| ≥𝛼, then the (|𝑄𝑟|−𝛼+1)-prefix of 𝑄𝑟 and the (|𝑄𝑠|−𝛼+1)-prefix of 𝑄𝑠 must have at least one common 𝑞-gram.

gsimjoin6
GSimJoin
  • Minimum Edit Filtering
  • 𝜏 =1,q=1 , LB=2 (need at least 2 matches) < 3

consider the two mismatching 𝑞-grams in 𝑠: C-O and C-N, it can be seen that they are disjoint

gsimjoin7
GSimJoin
  • To handle the general case where 𝑞-grams may overlap
  • minimum graph edit operation problem:

Given a multiset of 𝑞-grams 𝑄, find the minimum number of graph edit operations that can affect all the 𝑞-grams in 𝑄.

NP-Hard (greedy algorithm)

gsimjoin8
GSimJoin
  • Label Filtering
  • Connect the mismatching components, and compute the minimum edit operations for them.
references
References
  • John W. Raymond and Peter Willett, Maximum common subgraph isomorphism algorithms for the matching of chemical structures, Journal of Computer-Aided Molecular Design, 16: 521–533, 2002.
  • D. Conte, C. Guidobaldi, and C. Sansone, A Comparison of Three Maximum Common Subgraph Algorithms on a Large Database of Labeled Graphs, GbRPR\'03.
  • Etsuji Tomita, Akira Tanaka, Haruhisa Takahashi: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1): 28-42 (2006)
references1
References
  • Andrew K. C. Wong, Manlai You, S. C. Chan, An Algorithm for Graph Optimal Monomorphism, IEEE TRANSAC‘TIONS ON SYSTFMS, MAN, ANI) C’YBEKNETIC’S, VOI.. 20. NO. 3. MAY/JUNE 1990
  • Xiang Zhao, Chuan Xiao, Xuemin Lin, Wei Wang: Efficient Graph Similarity Joins with Edit Distance Constraints. ICDE 2012: 834-845
  • Zhiping Zeng, Anthony K. H. Tung, Jianyong Wang, Jianhua Feng, Lizhu Zhou: Comparing Stars: On Approximating Graph Edit Distance. PVLDB 2(1): 25-36 (2009)
ad