- By
**liv** - Follow User

- 142 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Graph Similarity' - liv

Download Now**An Image/Link below is provided (as is) to download presentation**

Download Now

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Graph Similarity

Institute of Computer Science and Technology of Peking University

Instructor: Lei Zou

Outline

- Maximal Common Subgraph
- Minimal Edit Distance
- Graph Similarity Search

Outline

- Maximal Common Subgraph
- Minimal Edit Distance
- Graph Similarity Search

Maximal Common Subgraph

Def. 1 (Induced Subgraph). An induced subgraph is a set S of vertices of a graph G and those edges of G with both endpoints in S.

Def. 2 (Maximal Common Induced Subgraph ) A graph G12 is a common induced subgraph of graphs G1 and G2 if G12 is isomorphic to induced subgraphs of G1 and G2, respectively. A maximum common induced subgraph (MCIS) consists of a graph G12 with the largest number of vertices meeting the aforementioned property

Def. 3 (Maximum Common Edge Subgraph) An MCES is a subgraph consisting of the largest number of edges common to both G1 and G2

A

A

B

C

B

C

D

D

Finding Maximal Common Subgraph

- Maximum clique-based algorithm(for MCIS)

Def. 4 The modular product of two graphs G1 and G2 is defined on the vertex set V (G1) × V (G2) with two

vertices (ui vi ) and (uj vj ) being adjacent whenever

1. ui and vi have the same vertex label, so do uj and vj

2. (ui uj ) ∈ E(G1) and (vi vj ) ∈ E(G2), or

3. (ui uj) E(G1) and (vi vj ) E(G2).

v1

u1

A

A

(u1, v1)

(u3, v3)

u2

u3

v3

v2

B

C

B

C

(u2, v2)

(u4, v4)

D

v4

D

u4

modular product

(association graph)

A Maximal Clique in the modular product corresponds to a maximal common induced subgraph

Def. 5 A clique in a graph G is a subset of vertices in the graph such that each pair of vertices in the subset is connected by an edge in the graph G.

A maximal clique (极大团) is a clique that cannot be extended by including one more adjacent vertex, that is, a clique which does not exist exclusively within the vertex set of a larger clique.

A maximum clique ( 最大团) is a clique of the largest possible size in a given graph. The clique number ω(G) of a graph G is the number of vertices in a maximum clique in G.

Finding Maximal Clique

- Bron–Kerbosch algorithm(@1973)

Basic Algorithm:

R=null; and P=V(G); // V(G) denotes all vertices in G

FindingMaximalClque(R,P):

if P is empty:

report R as a maximal clique

for each vertex v in P:

FindingMaximalClque (R ⋃ {v}, P ⋂ N(v))

// N(v) denotes all v’s neighbor vertices.

Problem: It may generate duplicate answers

R=null; and P=V(G); // V(G) denotes all vertices in G

FindingMaximalClque(R,P):

if P is empty:

report R as a maximal clique

for each vertex v in P:

FindingMaximalClque (R ⋃ {v}, P ⋂ N(v))

// N(v) denotes all v’s neighbor vertices.

P=P\ {v};

Problem: It may generate some un-maximal clique.

R=null; and P=V(G); // V(G) denotes all vertices in G

FindingMaximalClque(R,P, S):

if P and S are both empty:

report R as a maximal clique

for each vertex v in P:

FindingMaximalClque (R ⋃ {v}, P ⋂ N(v), X ⋂ N(v))

// N(v) denotes all v’s neighbor vertices.

P=P\ {v};

X= X ⋃ {v}; // why ???

Theorem. Given a vertex u, consider that all the maximal cliques containing Q ∪ {u} have been generated. Then, every new maximal clique containing Q, but not Q ∪ {u}, must contain at least one vertex q that is not adjacent to u.

Backtracking algorithms (e.g., McGregor algorithm) (for both MCIS and MCES)

It can be suitably described through a State Space Representation . Each state s represents a common subgraph of the two graphs under construction. This common subgraph is part of the MCS to be eventually formed.

Outline

- Maximal Common Subgraph
- Minimal Edit Distance
- Graph Similarity Search

Minimal Edit Distance

- Six edit operations
- Insert an isolated vertex
- Delete an isolated vertex
- Change the label of a vertex
- Insert an edge between two disconnected vertices
- Delete an edge from two connected vertices
- Change the label of an edge
- Graph Edit Distance:
- The minimum operations needed to transform a graph to another one (NP-Hard)

Minimal Edit Distance

Given two graphs G1 and G2, assume that they have the same number of vertices. Define a function f: V(G1) V(G2). The distance under this function is:

Minimal Edit Distance

The distance between G1 and G2 is defined as

We can prove that

If G1 and G2 have different vertex numbers, assume that |V(G1)| < |V(G2)|, we introduce |V(G2)|-|V(G1)| pseudo vertices, the following equation still holds.

Minimal Edit Distance

- Exact Algorithm (A*-algorithm )

What’s A*-algorithm:

A* uses a best-first search and finds a least-cost path from a given initial node to one goal node (out of one or more possible goals). As A* traverses the graph, it follows a path of the lowest known heuristic cost, keeping a sorted priority queue of alternate path segments along the way.

where g(x) denotes the cost from the starting node to the current node; h(x) denotes the "heuristic estimate“ (lower bound) of the distance from to the goal

Minimal Edit Distance

Given two graphs G1 and G2 have the same number m of vertices, let us consider the following process.

Let N1 and N2 denote the vertices in G1 and G2 that have been matched.

N1=(v1,v2,…,vn);

N2=(u1,u2,…,un);

Let M1 and M2 denote the vertices in G1 and G2 that have not been matched.

M1=(vn+1,vn+2,…,vm);

M2=(un+1,un+2,…,um).

Outline

- Maximal Common Subgraph
- Minimal Edit Distance
- Graph Similarity Search

Comparing Stars

Comparing Stars: On Approximating Graph Edit Distance

Zhiping Zeng, Anthony K.H. Tung, Jianyong Wang, Jianhua Feng, Lizhu Zhou

@VLDB09

Comparing Stars

- 问题定义

Given a graph database D consisting of n graphs

- Approximate full graph search
- Find all the graphs in D s.t. { gi | GED(q,gi) ≤𝜏 }
- Approximate subgraph search
- Find all the graphs in D s.t. { gi |GED(q,r) ≤𝜏 and r gi }

Comparing Stars

- Main Idea:

G star structures

- Star Structure:

三元组(r,L,l): r: root vertex

L: the set of leaves

l: labeling function

Comparing Stars

- Given two multisets of star structures S1 and S2, P: S1 S2 , is a bijection.
- Assignment Problem

Comparing Stars

A distance function f is metricif and only if the following conditions hold:

Comparing Stars

We can prove that

- Graph edit distance is metric. (assume that all edit operation cost is non-negative)
- Mapping distance is also metric.

Comparing Stars

- Given two graphs g1 and g2, Let P=(p1, p2, . . . , pk) be an alignment transforming g1 to g2. Accordingly, there is a sequence of graphs

g1=h0h1. . .hk=g2, where hi−1hi indicates that hi is the derived graph by performing pi over hi−1.

As is metric, thus, we have the following equation:

Comparing Stars

What’s the relationship between one operation pi and ?

- Edge Insertion/Deletion

One edge insertion/deletion at most affect two stars. Each star cost is at most 2.

Thus, due to one edge insertion/deletion.

Comparing Stars

What’s the relationship between one operation pi and ?

2. Vertex Insertion/Deletion

One vertex insertion/deletion at most affect one star. Each star cost is at most 1.

Thus, due to one vertex insertion/deletion.

Comparing Stars

What’s the relationship between one operation pi and ?

3. Vertex Relabeling

One vertex relabeling v0 at most affects deg(v0)+1’s stars.

Comparing Stars

Lower Bound

Comparing Stars

Upper Bound:

Based on the bipartite graph matching, we can define a upper bound for the edit distance.

Comparing Stars

Experiment datasets

- Real dataset
- AIDS antivirus screen component. 42,687 chemical components
- Synthetic dataset
- 1000 graphs, average size:10

GSimJoin

Efficient graph similarity joins with edit distance constraints

Xiang Zhao, Chun Xiao, Xuemin Lin, and Wei Wang.

@ICDE12

GSimJoin

- 问题定义

Given two sets of graphs 𝑅 and 𝑆, a graph similarity join with edit distance threshold 𝜏 returns pairs of graphs from each set, such that their graph edit distance is no larger than𝜏, i.e.,

{ ⟨𝑟, 𝑠⟩ ∣ 𝑔𝑒𝑑(𝑟, 𝑠) ≤ 𝜏, 𝑟 ∈ 𝑅, 𝑠 ∈ 𝑆 }.

- this paper will focus on the self-join case

{⟨𝑟𝑖, 𝑟𝑗⟩ ∣ 𝑔𝑒𝑑(𝑟𝑖, 𝑟𝑗) ≤ 𝜏 ∧ 𝑟𝑖.𝑖𝑑 <𝑟𝑗.𝑖𝑑,𝑟𝑖∈ 𝑅, 𝑟𝑗∈ 𝑅}.

GSimJoin

- Definition (path-based 𝑞-gram): A path-based 𝑞-gram in a graph 𝑟 is a simple path of length 𝑞.

GSimJoin

Let Qr denote the multiset of q-gram in a graph r and Qru denote the multiset of q-grams that contain the vertex u.

GSimJoin

- Count Filtering: Consider two graphs 𝑟 and 𝑠.

If 𝑔𝑒𝑑(𝑟, 𝑠) ≤ 𝜏 , 𝑟 and 𝑠 must share at least

𝐿𝐵𝑝𝑎𝑡ℎ common 𝑞-grams.

对于前面的两个例子𝜏 =1，

当q=1, LB = max(4-3,5-3)=2

当q=2, LB= max(5-5, 7-6) =1

GSimJoin

- Prefix Filtering

the 𝑝-prefix be their first 𝑝 elements. If

|𝑄𝑟 ∩𝑄𝑠| ≥𝛼, then the (|𝑄𝑟|−𝛼+1)-prefix of 𝑄𝑟 and the (|𝑄𝑠|−𝛼+1)-prefix of 𝑄𝑠 must have at least one common 𝑞-gram.

GSimJoin

- Minimum Edit Filtering
- 𝜏 =1，q=1 , LB=2 (need at least 2 matches) < 3

consider the two mismatching 𝑞-grams in 𝑠: C-O and C-N, it can be seen that they are disjoint

GSimJoin

- To handle the general case where 𝑞-grams may overlap
- minimum graph edit operation problem:

Given a multiset of 𝑞-grams 𝑄, find the minimum number of graph edit operations that can affect all the 𝑞-grams in 𝑄.

NP-Hard (greedy algorithm)

GSimJoin

- Label Filtering
- Connect the mismatching components, and compute the minimum edit operations for them.

References

- John W. Raymond and Peter Willett, Maximum common subgraph isomorphism algorithms for the matching of chemical structures, Journal of Computer-Aided Molecular Design, 16: 521–533, 2002.
- D. Conte, C. Guidobaldi, and C. Sansone, A Comparison of Three Maximum Common Subgraph Algorithms on a Large Database of Labeled Graphs, GbRPR'03.
- Etsuji Tomita, Akira Tanaka, Haruhisa Takahashi: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1): 28-42 (2006)

References

- Andrew K. C. Wong, Manlai You, S. C. Chan, An Algorithm for Graph Optimal Monomorphism, IEEE TRANSAC‘TIONS ON SYSTFMS, MAN, ANI) C’YBEKNETIC’S, VOI.. 20. NO. 3. MAY/JUNE 1990
- Xiang Zhao, Chuan Xiao, Xuemin Lin, Wei Wang: Efficient Graph Similarity Joins with Edit Distance Constraints. ICDE 2012: 834-845
- Zhiping Zeng, Anthony K. H. Tung, Jianyong Wang, Jianhua Feng, Lizhu Zhou: Comparing Stars: On Approximating Graph Edit Distance. PVLDB 2(1): 25-36 (2009)

Download Presentation

Connecting to Server..