Community Structure in Large Complex Networks Liaoruo Wang and John E. Hopcroft Dept. of Computer Engineering & Computer Science, Cornell University In Proc. 7th Annual Conference on Theory and Applications of Models of Computation (TAMC), June 2010 Presented by Nam Nguyen
Agenda • Motivation • Introduction • Contributions of the paper • Definitions • WHISKER is NP-Complete. • Algorithms.
Motivation • C.S is a classical but still-hot topic in complex networks. • Previous studies: Communities were assumed to be densely connected inside but sparsely connected outside. • A different point of view: We should disregard “whiskers” and elaborate “cores” in the networks.
Introduction • Roughly speaking • Whiskers: Subsets of vertices that are barely connected to the rest of the network. • Cores: Connected subgraphs that are densely connected inside and well-connected to the rest of the network, i.e., “real communities” • Why??? • For real-world societies, communities are also well connected to the rest of the network. • Imagine a close-nit community, CISE Dept., with only one connection with the outer world. • Definitions come right away.
Contributions • More concrete definitions of “whiskers” and “cores” in a networks. • WHISKER is NP-Complete • Three heuristic algorithms for finding approximate cores. • Simulation results.
Definition • Graph G = (V,E) undirected, A = (Ai,j). For S⊆V, let SC = V\S. • Conduction of S where • A suitable cut
Definition(cont’d) • A k-whisker • A maximal k-whisker
Definition (cont’d) • A whisker • A maximal whisker
Definition (cont’d) • A core
Lemmas Proof The only suitable cut of size = 26 > |S ⋃ T| = 25
Lemmas (cont’d) Proof (1a) exr + exz + eyr + eyz ≤ vx + vy (1b) eyr + exy + ezr + exz ≤ vy + vz (1c) exr + eyr + ezr > vx + vz (1a) + (1b) and use (1c) gives exr+2eyr+ezr+exy+eyz+2exz ≤ vx+2vy+vz < exr+eyr+ezr+vy eyr + exy + eyz < vy
NP-Completeness • NAE-3-SAT: The problem of determining whether there exists a truth assignment for a 3-CNF Boolean formula such that each clause has at least one true literal and at least one false literal. Fact: NAE-3-SAT is NP-Complete  • WHISKER: Given an unweighted undirected graph, determine whether there exists a whisker or not. WHISKER is NP-Complete (of course, from a reduction from NAE-3-SAT)
WHISKER is NP-Complete • Road map • 1. Construct a special graph G of 2n vertices and show that G admits 2n whiskers and no more. • 2. Construct a G-like graph for the 3-SAT problem. • 3. Make a reduction from NAE-3-SAT problem to WHISKER
NP-Completeness • WHISKER is in NP • Reduction from NAE-3-SAT to WHISKER • Consider the following graph (constructed in poly time) • At each row, pick only one vertex (i.e., either xi or ¬xi) • The resulted graph G of n vertices is a whisker • Total number of whiskers is 2n ………… • And no more than that
NP-Complete • 2n whiskers and no more than that!!! Why??? • Suppose there is a whisker W of 2k+j vertices • Cut size of W • By definition of suitable cut size, we have which implies !!!!
NP-Complete • NAE-3-SAT ≤PWHISKER • Consider an instance of NAE-3-SAT with n variables and c clauses. • Construct G1, G2, …, Gc as follow
NP-Complete • NAE-3-SAT ≤PWHISKER • Now, combine all Gi’s and add up all edge weights to get G’. • Next update G G G* 3CNF has a satisfied assignment update G’ G’ contains a whisker
NP-Complete • Update G ( ) • Update G’ • Amplify all edge weights of G’ by a small amount δ where cn2δ << 1 • All whiskers in new G are the same as in old G.
NP-Complete • G* = G + G’ • Goal: If the 3CNF instance has a satisfied truth assignment, then selecting true literal from each row of G* gives us a whisker of size n, and vice versa. • For any truth assignment of 3SAT, rearrange the literals in to TRUE and FALSE columns. • If there is a satisfied not-all-equal assignment for 3SAT • Each clause must have one TRUE and one FALSE literals. • Not all the literals in each clause can be in the same column. • For each ith clause, Gi contains n2-2 edges connecting its two columns • Total cut size is required to satisfied
NP-Complete • If there is NO satisfied not-all-equal assignment for 3SAT • At least one clause i has its literals located in the same column n2 edges between the two columns of Gi. • For the other (c-1) clauses, there are at most (n2-2) edges connecting the their two columns. Total number of edges: (c-1)(n2-2)+n2 = cn2–2c+2. • Of course, we don’t want selecting the true literal in each row give us a whisker, thus Combining the two inequalities, if ℇ and δ is chosen such that Then If the 3CNF instance has a satisfied truth assignment, then selecting true literal from each row of G* gives us a whisker of size n, and vice versa. • Hence, NAE-3-CNF ≤PWHISKER □
Results • On random graph • Alg 2 can positively find an approximate core • Alg 3 fails to find approximate core • The size of core growing linearly with d = np (fixed n) and logarithmically with n (fixed d) • ??? G(n,p) displays core structure with high probability when p > 1/n ???
Results • Textual graph • Vertices and Edges: Words and their semantic Correlations • Data is crawled from 10K scientific papers of KDD conf. (1992-2003) • Pointwise mutual information • Total: 685 vertices and 6.432 edges
Results • Both alg 2 and 3 successfully find approximate cores. • Higher values of λ indicate smaller core sizes. • Fig (b), the best community of the textual graph has a large conductance of .3 best community has as many internal edges as cut edges. • Alg 3 is believed to be more useful.
Comment • Is a “whisker” make sense?
Reference •  Schaefer, T. J. The complexity of satisfiability problems. In Proc. 10th Ann. ACM Symp. on Theory of Computing (1978), Association for Computing Machinery, pp. 216-226.