Overlapping Community Search

Overlapping Community Search Wanyun Cui

Problem What is community search? Given a graph G(V,E), a query vertex v, find the community that v belong to in G. Done in my recent research <Local Search of Communities in Large Graphs> What is overlapping community search(OCS)? Given a graph G(V,E), where communities may be overlapping , and a query vertex v, find all the communities that contain v In a network with overlapping communities, different communities may share common vertices.

Motivations Many real graphs always have overlapping communities a large fraction of proteins belong to several protein complexes simultaneously [1] Online social networks are made of highly overlapping cohesive communities. [2] Applications of overlapping community search Compute the centrality of a given vertex[3] Find the communities of a protein, based on their interactions. Previous research focus on overlapping communities detection, but none for overlapping community search

Challenges • Computational intractability • NP-Complete as will be proved • How to well define overlapping community search problem remains a challenges • It is difficult to precisely model the complex semantic of real overlapping communities. • Avoid trivial results • For example, when the definition is too restrictive, no meaningful communities may be found • It is difficult to distinguish between two overlapping communities and one large community with two component • Scalable to real large graph with million nodes • Current overlapping community detection can only handle graphs with ten thousands of nodes • Why current community search can not extend to solve OCS?

Related Works <A Multi-Resolution Approach to Learning with Overlapping Communities>KDD2010 WORKSHOP

Related Works <Detecting the overlapping and hierarchical community structure in complex networks> New Journal of Physics

Related Works <Detecting the overlapping and hierarchical community structure in complex networks>

Related Works <Uncovering the overlapping community structure of complex networks in nature and society> Nature 2005 a k-clique-community as a union of all k-cliques (complete subgraph of size k) that can be reached from each other through a series of adjacent k-cliques (where adjacency means sharing k − 1 nodes)

Related Works <Detect overlapping and hierarchical community structure in networks>

Problem Definition Given an undirected graph G(V,E) and a query vertex v \in V, find all the overlapping communities that contain v. A community based on k-clique a union of all k-cliques that Are reachable from each other through a series of adjacent k-cliques Where K-clique is a complete graph with k vertices Two k-cliques are adjacent if they share k-1 vertices A k-clique Cs is reachable to another k-clique Ct if there exists a k-clique path C1=Cs, C2,…,Cj=Ct, such that each Ci is adjacent to Ci+1

Why we use k-clique based community? Meaningful communities it cannot be too restrictive C1 In real cases, we set k=3~4, which is less restrictive should be based on the density of links C2 should allow overlaps, and strictly distinguish different communities C3 Can be implemented through local information C4

Different Community Measures F(V)=min{degG[V](v)|v∈V} M1 <The community-search problem and how to plan a successful cocktail party> KDD 2010 F(V)=\sigma{degG[V](v)/|V|} M2 <Greedy approximation algorithms for finding dense components in a graph> APPROX, 2000. M3 kgin and kgout are the total internal and external degrees of the nodes, and \alpha is a parameter <Detecting the overlapping and hierarchical community structure in complex networks>

Summaries of community measures

Why we use k-clique based community? K-clique community is meaningful in many real networks Scientist collaboration network South Florida Free Association norms list Protein interaction network See Nature 2005

Hardness of the problem It is a NP-complete problem Can be reduced to maximal clique problem

Naïve Approach Step 1: Initially VC=v Step 2: For each unvisited vertex u \in VC Set u as visited; Find all k-cliques that contain u; Add those vertices in the k-cliques into VC Step 3: Calculate the adjacency matrix of k-clique result set Step 4: Combine all known k-cliques by union-find Step 5: Return all the k-clique chains which contain v

Optimization 1 • In step 2, find k-clique for each (k-1)-size of subsets instead of each vertex • Example • Suppose v=a, k=4 • At first we find a k-clique {a,b,c,d} • If we try to find the k- cliques that contain b, we will find {b,f,g,h} • According to the definition of adjacency, {b,f,g,h} is meaningless

Optimization 1 • Step 2 • Skc denote the k-cliques found so far, initially Skc=\emptyset • Find all the k-cliques that contain v, and add those k-cliques into Skc • For each unvisited Skckci(k-clique i) • Find all the adjacent k-clique of kci, add them into Skc • We don’t need step 3 and step 4 any more, for we already know the adjacency relationship among those k-cliques in step 2 • Complexity analysis

Property of the naïve solution • Order-independent

Optimization 2 • For each k-clique kci, finding all the adjacent k-cliques to it costs a lot. How to reduce the time? • DFS vs BFS? • We use DFS order to expand all those k-cliques • When using DFS order, any two adjacent k-cliques in the searching clique sequences .

Optimization 2 • A new data structure • Suppose the current k- clique is kci • Make k list, l1..lk • lj contains all the vertices that have j edges to the current vertices

Optimization 2 • Example • Current k-clique: {a,b,c,d} • Current state • 4 |-> null • 3 |-> e -> null • 2 |-> null • 1 |-> f -> g->h ->null • Enumerate the vertex u that will be replaced • Delete u and maintain the data structure • The left vertices and any vertices in the list lk-1 will form a k-clique

Optimization 3 • Brute-force method for determining whether a k-clique exists in Skc costs a lot • Use hash function to determine whether a k-clique exists in Skc • Desired hash function • Order-indepedent • Another benefit of DFS: reduce the hash time from O(k) to O(1) • Hash function example

Problems and Future Work What if we change the definition of adjacency? How many k-cliques are produced before step 3? Worst case analysis Real network case More optimization for the algorithm? Theoretical research direction? Boot Strapping

Reference • [1] Gavin, A. C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002). • [2]Lei Tang, Xufei Wang, Huan Liu and Lei Wang. A Multi-Resolution Approach to Learning with Overlapping Communities. In Workshop on Social Media Analytics, KDD 2010. • [3] Martin G. Everett and Stephen P. Borgatti. Analyzing clique overlap. CONNECTIONS, 21(1):49–61, 1998.

Emergence of Community in Complex Networks gdm@fudan

Motivation • Real networks shows community structure • What is the underlying mechanism accounting for the emergence of community? • How to generate a network with community structure following these principle?

Basic Idea: Distance based • New vertices are randomly located at n-dimensional space • The probability of a link from v to u is based on the distance between v and u • The probability of v belong to which community is also based on the distance between v and the center of the community

Experiment • On 1-dimensional space, vertices are well clustered, the modularity of the graph is larger than 0.3 • Higher dimensional space makes lower modularity

Overlapping Community Search

Overlapping Community Search

Presentation Transcript

Proving Overlapping Triangles

THE PROBLEM Overlapping Jurisdiction

Overlapping Toes

Overlapping Generations

Overlapping Community Search for Social Networks

Overlapping Events

Online Search of Overlapping Communities

Overlapping

Overlapping community detection

LinkSCAN *: Overlapping Community Detection Using the Link-Space Transformation

Overlapping Gesture

Overlapping Triangles

3.5 Overlapping Triangles

Overlapping Sets

Overlapping Community Detection in Networks

Overlapping Orders

Overlapping Date Ranges

Overlapping Triangle Proofs

Native Community Search Mobile Application

Overlapping Fish

Overlapping Generations

Overlapping BSS Proposed Solution