1 / 57

Graph Similarity

北京大学计算机科学技术研究所 Institute of Computer Science and Technology of Peking University. Graph Similarity. Instructor: Lei Zou. Outline. Maximal Common Subgraph Minimal Edit Distance Graph Similarity Search. Outline. Maximal Common Subgraph Minimal Edit Distance Graph Similarity Search.

liv
Download Presentation

Graph Similarity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 北京大学计算机科学技术研究所 Institute of Computer Science and Technology of Peking University Graph Similarity Instructor: Lei Zou

  2. Outline • Maximal Common Subgraph • Minimal Edit Distance • Graph Similarity Search

  3. Outline • Maximal Common Subgraph • Minimal Edit Distance • Graph Similarity Search

  4. Maximal Common Subgraph Def. 1 (Induced Subgraph). An induced subgraph is a set S of vertices of a graph G and those edges of G with both endpoints in S. Def. 2 (Maximal Common Induced Subgraph ) A graph G12 is a common induced subgraph of graphs G1 and G2 if G12 is isomorphic to induced subgraphs of G1 and G2, respectively. A maximum common induced subgraph (MCIS) consists of a graph G12 with the largest number of vertices meeting the aforementioned property

  5. A A A A B C B C B C B C D D D D MCIS MCIS

  6. Def. 3 (Maximum Common Edge Subgraph) An MCES is a subgraph consisting of the largest number of edges common to both G1 and G2 A A B C B C D D

  7. Finding Maximal Common Subgraph • Maximum clique-based algorithm(for MCIS) Def. 4 The modular product of two graphs G1 and G2 is defined on the vertex set V (G1) × V (G2) with two vertices (ui vi ) and (uj vj ) being adjacent whenever 1. ui and vi have the same vertex label, so do uj and vj 2. (ui uj ) ∈ E(G1) and (vi vj ) ∈ E(G2), or 3. (ui uj) E(G1) and (vi vj ) E(G2).

  8. Maximal Clique v1 u1 A A (u1, v1) (u3, v3) u2 u3 v3 v2 B C B C (u2, v2) (u4, v4) D v4 D u4 modular product (association graph) A Maximal Clique in the modular product corresponds to a maximal common induced subgraph

  9. Def. 5 A clique in a graph G is a subset of vertices in the graph such that each pair of vertices in the subset is connected by an edge in the graph G. A maximal clique (极大团) is a clique that cannot be extended by including one more adjacent vertex, that is, a clique which does not exist exclusively within the vertex set of a larger clique. A maximum clique ( 最大团) is a clique of the largest possible size in a given graph. The clique number ω(G) of a graph G is the number of vertices in a maximum clique in G.

  10. 5 Maximal clique: (1,2,3) (1,3,4,5) A maximum clique: (1,3,4,5) 1 4 2 3

  11. Finding Maximal Clique • Bron–Kerbosch algorithm(@1973) Basic Algorithm: R=null; and P=V(G); // V(G) denotes all vertices in G FindingMaximalClque(R,P):     if P is empty:         report R as a maximal clique     for each vertex v in P:         FindingMaximalClque (R ⋃ {v}, P ⋂ N(v)) // N(v) denotes all v’s neighbor vertices.     Problem: It may generate duplicate answers

  12.  (1,2 )  (1,2,3) • (1)  (1,3 )  (1,3,2) 1 2 3

  13. R=null; and P=V(G); // V(G) denotes all vertices in G FindingMaximalClque(R,P):     if P is empty:         report R as a maximal clique     for each vertex v in P:          FindingMaximalClque (R ⋃ {v}, P ⋂ N(v)) // N(v) denotes all v’s neighbor vertices.  P=P\ {v}; Problem: It may generate some un-maximal clique.

  14.  (1,2 )  (1,2,3) • (1)  (1,3 ) 1 2 3

  15. Bron–Kerbosch algorithm R=null; and P=V(G); // V(G) denotes all vertices in G FindingMaximalClque(R,P, S):     if P and S are both empty:         report R as a maximal clique     for each vertex v in P:          FindingMaximalClque (R ⋃ {v}, P ⋂ N(v), X ⋂ N(v)) // N(v) denotes all v’s neighbor vertices.  P=P\ {v}; X= X ⋃ {v}; // why ???

  16. Theorem. Given a vertex u, consider that all the maximal cliques containing Q ∪ {u} have been generated. Then, every new maximal clique containing Q, but not Q ∪ {u}, must contain at least one vertex q that is not adjacent to u.

  17. Backtracking algorithms (e.g., McGregor algorithm) (for both MCIS and MCES) It can be suitably described through a State Space Representation . Each state s represents a common subgraph of the two graphs under construction. This common subgraph is part of the MCS to be eventually formed.

  18. Outline • Maximal Common Subgraph • Minimal Edit Distance • Graph Similarity Search

  19. Minimal Edit Distance • Six edit operations • Insert an isolated vertex • Delete an isolated vertex • Change the label of a vertex • Insert an edge between two disconnected vertices • Delete an edge from two connected vertices • Change the label of an edge • Graph Edit Distance: • The minimum operations needed to transform a graph to another one (NP-Hard)

  20. Minimal Edit Distance A A A B D B D B C G1 A A B C B C MED(G1,G2)=4 D D G2

  21. Minimal Edit Distance Given two graphs G1 and G2, assume that they have the same number of vertices. Define a function f: V(G1)  V(G2). The distance under this function is:

  22. Minimal Edit Distance The distance between G1 and G2 is defined as We can prove that If G1 and G2 have different vertex numbers, assume that |V(G1)| < |V(G2)|, we introduce |V(G2)|-|V(G1)| pseudo vertices, the following equation still holds.

  23. Minimal Edit Distance A A B D G1 B C D G2

  24. Minimal Edit Distance • Exact Algorithm (A*-algorithm ) What’s A*-algorithm: A* uses a best-first search and finds a least-cost path from a given initial node to one goal node (out of one or more possible goals). As A* traverses the graph, it follows a path of the lowest known heuristic cost, keeping a sorted priority queue of alternate path segments along the way. where g(x) denotes the cost from the starting node to the current node; h(x) denotes the "heuristic estimate“ (lower bound) of the distance from  to the goal

  25. Minimal Edit Distance Given two graphs G1 and G2 have the same number m of vertices, let us consider the following process. Let N1 and N2 denote the vertices in G1 and G2 that have been matched. N1=(v1,v2,…,vn); N2=(u1,u2,…,un); Let M1 and M2 denote the vertices in G1 and G2 that have not been matched. M1=(vn+1,vn+2,…,vm); M2=(un+1,un+2,…,um).

  26. Minimal Edit Distance

  27. Outline • Maximal Common Subgraph • Minimal Edit Distance • Graph Similarity Search

  28. Comparing Stars Comparing Stars: On Approximating Graph Edit Distance Zhiping Zeng, Anthony K.H. Tung, Jianyong Wang, Jianhua Feng, Lizhu Zhou @VLDB09

  29. Comparing Stars • 问题定义 Given a graph database D consisting of n graphs • Approximate full graph search • Find all the graphs in D s.t. { gi | GED(q,gi) ≤𝜏 } • Approximate subgraph search • Find all the graphs in D s.t. { gi |GED(q,r) ≤𝜏 and r gi }

  30. Comparing Stars • Main Idea: G star structures • Star Structure: 三元组(r,L,l): r: root vertex L: the set of leaves l: labeling function

  31. Comparing Stars T • Star edit distance

  32. Comparing Stars • Given two multisets of star structures S1 and S2, P: S1 S2 , is a bijection. • Assignment Problem

  33. Comparing Stars

  34. Comparing Stars

  35. Comparing Stars What’s the relationship between GED(g1,g2) and ?

  36. Comparing Stars A distance function f is metricif and only if the following conditions hold:

  37. Comparing Stars We can prove that • Graph edit distance is metric. (assume that all edit operation cost is non-negative) • Mapping distance is also metric.

  38. Comparing Stars • Given two graphs g1 and g2, Let P=(p1, p2, . . . , pk) be an alignment transforming g1 to g2. Accordingly, there is a sequence of graphs g1=h0h1. . .hk=g2, where hi−1hi indicates that hi is the derived graph by performing pi over hi−1. As is metric, thus, we have the following equation:

  39. Comparing Stars What’s the relationship between one operation pi and ? • Edge Insertion/Deletion One edge insertion/deletion at most affect two stars. Each star cost is at most 2. Thus, due to one edge insertion/deletion.

  40. Comparing Stars What’s the relationship between one operation pi and ? 2. Vertex Insertion/Deletion One vertex insertion/deletion at most affect one star. Each star cost is at most 1. Thus, due to one vertex insertion/deletion.

  41. Comparing Stars What’s the relationship between one operation pi and ? 3. Vertex Relabeling One vertex relabeling v0 at most affects deg(v0)+1’s stars.

  42. Comparing Stars Lower Bound

  43. Comparing Stars Upper Bound: Based on the bipartite graph matching, we can define a upper bound for the edit distance.

  44. Comparing Stars

  45. Comparing Stars Experiment datasets • Real dataset • AIDS antivirus screen component. 42,687 chemical components • Synthetic dataset • 1000 graphs, average size:10

  46. GSimJoin Efficient graph similarity joins with edit distance constraints Xiang Zhao, Chun Xiao, Xuemin Lin, and Wei Wang. @ICDE12

  47. GSimJoin • 问题定义 Given two sets of graphs 𝑅 and 𝑆, a graph similarity join with edit distance threshold 𝜏 returns pairs of graphs from each set, such that their graph edit distance is no larger than𝜏, i.e., { ⟨𝑟, 𝑠⟩ ∣ 𝑔𝑒𝑑(𝑟, 𝑠) ≤ 𝜏, 𝑟 ∈ 𝑅, 𝑠 ∈ 𝑆 }. • this paper will focus on the self-join case {⟨𝑟𝑖, 𝑟𝑗⟩ ∣ 𝑔𝑒𝑑(𝑟𝑖, 𝑟𝑗) ≤ 𝜏 ∧ 𝑟𝑖.𝑖𝑑 <𝑟𝑗.𝑖𝑑,𝑟𝑖∈ 𝑅, 𝑟𝑗∈ 𝑅}.

  48. GSimJoin • Definition (path-based 𝑞-gram): A path-based 𝑞-gram in a graph 𝑟 is a simple path of length 𝑞.

  49. GSimJoin Let Qr denote the multiset of q-gram in a graph r and Qru denote the multiset of q-grams that contain the vertex u.

More Related